Fixing HA Sync Problems
(2017-11-13)
Problem:
fgt300d-b (global) # get system ha status
HA Health Status: OK
Model: FortiGate-300D
Mode: HA A-P
Group: 0
Debug: 0
[...]
Configuration Status:
FGT3HD--------42(updated 5 seconds ago): in-sync
FGT3HD--------79(updated 2 seconds ago): out-of-sync
Solution:
Connect to the non-master unit:
fgt300d-b # conf global
fgt300d-b (global) # execute ha manage 0
fgt300d-a login:
On the "out of sync" unit, force a synchronization:
fgt300d-a # conf global
fgt300d-a (global) # execute ha synchronize start
On both units, show the vDom checksums:
fgt300d-a (global) # di sys ha checksum show
is_manage_master()=0, is_root_master()=0
debugzone
global: 23 57 94 4a 63 8e 02 de ac c7 5d 83 aa 9e cf 4c
root: 08 c6 69 fd ec 68 b8 f0 9a a7 32 34 d8 fc 2e d0
Back: b2 42 60 d5 b0 5a a6 d2 61 da 0a 28 85 1b f8 09
Edge: 60 c8 6b e8 3e c9 46 0c 89 19 aa 15 92 63 e1 61
all: d3 f7 57 94 02 63 29 5c 36 a6 ff ea 37 87 e5 36
checksum
global: 23 57 94 4a 63 8e 02 de ac c7 5d 83 aa 9e cf 4c
root: 08 c6 69 fd ec 68 b8 f0 9a a7 32 34 d8 fc 2e d0
Back: b2 42 60 d5 b0 5a a6 d2 61 da 0a 28 85 1b f8 09
Edge: 60 c8 6b e8 3e c9 46 0c 89 19 aa 15 92 63 e1 61
all: d3 f7 57 94 02 63 29 5c 36 a6 ff ea 37 87 e5 36
If the checksums all match between units, you're done.
If not, force each unit to recalculate each checksum:
fgt300d-a (global) # diagnose sys ha checksum recalculate
fgt300d-a (global) # di sys ha checksum show
[checksums show as above]
If the checksums all match between units, you're done.
If not: for each section listed (above: global, root, Back, Edge) get a section checksum and compare it to the value from the other unit:
fgt300d-a (global) # di sys ha checksum show Back
[gratuitous output]
...
$ diff Back.?
1c1
< fgt300d-a (global) # di sys ha checksum show Back
---
> fgt300d-b (global) # di sys ha checksum show Back
12c12
< system.dhcp.server: 00000000000000000000000000000000
---
> system.dhcp.server: 702dc7bc45da4fc11dbc9b9de2923862
In the example above, the system dhcp server has values set on the B unit but not on the A unit. Either replicate the configuration from B to A, or remove from B.
If you've made any changes, force a checksum recalculation as above, and then check the config status.
Note:
I did once have a pair of FGT60E firewalls running 5.6.9 which claimed to not be in sync -- right down to the slave constantly trying to trigger a re-sync -- but the checksums all matched. Forcing both firewalls to recalculate the checksums fixed the issue.