Basic SD-WAN for 7.4.x

(2026-04-08)

What This Is About

(This is a refinement of my previous sdwan writings.)

In the course of my day-to-day duties I've been seeing a lot of sdwan configs that are misbehaving, causing users to believe that their internet links are worse than they actually are. I've therefore put together a couple of sample configurations that should be usable in the basic cases such that sdwan will do what we want without unduely preventing users from getting done what they need to.

The main issue I'm encountering is SLA-driven flap, where a link is repeatedly taken out of, then put back into, service. Usually this kills all the active sessions because when a route is removed from the routing table, the address used to NAT out the connections changes, which means the sessions get rejected by the far end.

In an ideal world all ISPs provide links of stellar quality and you don't have to worry about blips in packet loss or latency. However out here in the real world we have offices with some pretty terrible links, and while in theory you want to not use a link if it is terrible, you have to consider the alternative. And if there is no alternative, why are you messing with it? You have what you have, and in that absence of a better option a bad link beats a down link in 95% of the scenarios that come up.

My primary objectives when setting up or reconfiguring a sdwan for internet are:

minimize or eliminate link flap by making things excessively tolerant and turning connection issues back on the client applications; and
if a link routinely bad because of packet loss, or excessive latency/jitter, that is an ISP problem and should be pursued with them rather than trying to mask it with sdwan trickery.

What This Is Not

If your circumstances differ from the beginning premise of each example in any way, such that you might reasonably ask but what if... or how about... then this page is not for you. You will have to seek your answers elsewhere.

So I have two pet peeves.

Peeve #1: One-armed sdwan configured with SLA targets

There is no point having a one-armed sdwan with SLA targets because there’s no alternative link around to maybe be able to take the traffic. In these scenarios, a terrible link will be better than an artificially-downed link every single time. In my view the default one-armed sdwan config needs to look something like this:

config system sdwan
    set status enable
    config zone
        edit "virtual-wan-link"
        next
    end
    config members
        edit 1
            set interface "wan1"
            set gateway a.b.c.d
        next
    end
    config health-check
        edit "Ping_SLA"
            set server "8.8.4.4" "9.9.9.9"
            set interval 1000
            set probe-timeout 1000
            set update-static-route disable
            set sla-fail-log-period 60
            set sla-pass-log-period 0
            set members 1
        next
    end
   config service
        edit 1
            set name "Default_Outbound"
            set dst "all"
            set src "all"
            set priority-members 1
        next
    end
end

A few comments here:

Note the absence of wan2. If it isn’t active, don’t muddle things by pre-adding it. When the new ISP comes we can just add it at that point.
I prefer ping tests to DNS tests since the interaction with the DNS server adds more variability to the return timing;
I usually prefer to not to use google as a ping test because they seem to be aggressively peered with ISPs and so latency can be artificially low
Running tests once per second, with one second timeout tolerance will reduce the chance of an artificial down event
The sla log periods are set the way they are because A) they’re excessively noisy and B) I’ve never run into anyone who’s cared about the historic values of SLA tests that passed, ever

Peeve #2: Sdwan configurations with excessively low SLA thresholds

In this case, bad (or simply busy) links can be aribitrarilly marked as not sufficient for use (and with the “Delete route” set, not sufficient for ANY use). At best, you will end up with links flapping as links go in and out of SLA; at worst, you will end up with all links marked as unsuitable for use even though they’d actually pass traffic if used. My suggestion for a starting point for a configuration for a dual-armed sdwan where one is specifically for regular use and the other is expressly a backup link:

config system sdwan
    set status enable
    config zone
        edit "virtual-wan-link"
        next
    end
    config members
        edit 1
            set interface "wan1"
            set gateway a.b.c.d
        next
        edit 1
            set interface "wan2"
            set gateway e.f.g.h
        next
    end
    config health-check
        edit "Ping_SLA"
            set server "8.8.4.4" "9.9.9.9"
            set interval 1000
            set probe-timeout 1000
            set update-static-route disable
            set sla-fail-log-period 60
            set sla-pass-log-period 0
            set members 1 2
        next
    end
   config service
        edit 1
            set name "Default_Outbound"
            set dst "all"
            set src "all"
            set priority-members 1 2
        next
    end
end

Notes here:

The primary goal with this setup is to minimize flapping events by making things excessively tolerant and turning connection issues back on the client applications
Main link is used exclusively unless marked down
Main link will only be marked down if it fails five consecutive ping tests
Lack of a “remove route” means that it will be up to the active application to detect that the link stopped working and reconnect or die accordingly
Lack of a “remove route” also means that active sessions on wan2 will stay on wan2 when wan1 comes back up
Personally I don’t have comfort with the “load balancing” algorithms in use by 7.4.x so I’d not use them. I like the idea of source/dest-ip-hashing, but that seems to not be an option any more.
In the event that we had a situation where traffic x (say voip) needed to go on wan2 while everything else rides wan1, I’d add make the service section look like

   config service
        edit 2
            set name "VoIP"
            set dst "VoIP Service Server Group"
            set src “Phones”
            set priority-members 2
        next
        edit 1
            set name "Default_Outbound"
            set dst "all"
            set src "all"
            set priority-members 1 2
        next
    end

Here phones will use wan2 unless it is down, in which case their traffic falls through to rule 1, and will then use wan1.
I’d probably be loose in service #2’s dst/src, probably only picking “all the phones” or “all the voip servers”.

Scenarios from here only get more complicated, but in general my view is that specific SLA targets are rarely needed, and tight SLA targets are never needed:

If latency, jitter, or packet loss is important as a link selection criteria, then the tests should be constructed so that they run against the relevant internet servers – ie the specific servers used by the phone vendors -- rather than the usual DNS suspects.
SLA targets should be extremely loose so that links which are “up” can satisfy them, but we can choose which link to use based on which one is “better” – tight rules set a threshold that effectively say “over this threshold this link should not be used for this purpose” and run the risk that none of the links satisfy the criteria. When this happens, even with two links functionally up they are unusable. As I said above, in some circumstances a bad link beats a down link.
“Remove Route” setting should be avoided so that sessions which are using link A when link B becomes better don’t get arbitrarily dropped.
In general, “Remove Route” is saying “this link is so bad it should not be used under any circumstances, ever.”

Tac Report

OSPF Troubleshooting

Memory Use

Enable the SSLVPN GUI Menus

DHCP Debugging

Console Pager

BGP Neighbours -- Advertised, Received, and Learned

Fortiguard Communication

Open sniffer output in Wireshark

Fortigate Can't Connect To Fortimanager

Basic SD-WAN for 7.4.x

Check Internet Service Database Match

Block QUIC in 7.2.x

Show FQDN IPs in DNS Cache

Looking at logs via CLI

Link Monitor

IPsec MTU Varies By Encryption Algorythm

IPsec MTU Adjustments

Export Logs From Fortigate Disk Over FTP

Debug sdwan

Debug PPPoE Connections

Debug messages with timestamp

CLI Policy Lookup

Check Geo-IP Region For A Specific IP

Basic SDWAN Setup

Debug SSLVPN Authentication Issues

Troubleshoot IPsec VPN Tunnels, Packet size related

Device-Local Certificate Expired

Device Table Size Maximum

Show WAN interface public IPs

FortiClient Error Codes

Wifi 802.1x with LDAP Groups

VPN Tunnel Interface Address

VPN Debug

SD-WAN Diagnostics

Read-Only Admin Profile

List Connected Users

LDAP lookup account considerations

IPSA self test failed, disable IPSA!

iPerf3 on Fortigates

Displaying Logs From Console

CLI Restore Configuration

VPN Ciphers, Encryption, DH Settings Recommendations

Traffic Shaping Policy

System Version from CLI

SSLVPN DNS Suffix

SSL Security Settings

SSH Pubkey Login

Routing Table

Microsoft Office Whitelisting

List DHCP Clients CLI

IPsec Not Passing Packets

FortiOS LLDP

BGP Sessions

BGP Neighbors

Wifi Clients Connect But Can't Get DHCP Lease

VPN Flapping Leads To Bogus Routing

System Time

SSLVPN Logs Out After 8 Hours

RTC Power Status Failed

List SSLVPN Users

List DHCP Client Leases

Forticlient Registering To Fortigate

Fixing HA Sync Problems

Factory Reset

Debug Sniffer

Debug Session

Debug Flow

CPU is pinned

Connecting Multiple vDOMs to the same VLAN

CLI Upgrade Firmware

CLI Disk Scan FSCK

VPN Tunnel Details

VPN Fragmentation

VoIP Clients with Fortigates

Management Access

HA Cluster Synchronization

HA Cluster Status

EXT3 fs error (device)

Dual-WAN Gateways