# pfsync states not synching back to master host after reboot



## CraigH (Jul 5, 2012)

I have a two-host routing cluster set up running pf, pfsync and carp on 9.0-RELEASE-p3. The two hosts are connected by a cross-over cable that is configured as the pfsync syncdev. The setup is working well, with the first host (host A) as the master (advskew 0) and the second host (host B) as the backup (advskew 100). Connections established while host A is running are copied over to host B, and the connections stay active when host A is shut down or rebooted and host B becomes the master.

The problem occurs when host A is started up again. When it boots, it once again becomes the master, it requests and receives a bulk update of the pf states from host B, the update is successful, but it doesn't get any of the states that were initially created on host A, only new states created on host B. Once host A has come back up, connections that were active before host A went down will break.

For example, while host A is master, make an SSH connection through the router. The states show up on both A and B:


```
all tcp 10.4.1.4:22 <- 10.2.1.4:49303       ESTABLISHED:ESTABLISHED
   [700875095 + 22144] wscale 9  [1344039155 + 8192] wscale 7
   age 00:00:11, expires in 23:59:52, 23:33 pkts, 4140:4909 bytes, anchor 197, rule 8
   id: 4ff4def10000114f creatorid: 76e4b844
all tcp 10.2.1.4:49303 -> 10.4.1.4:22       ESTABLISHED:ESTABLISHED
   [1344039155 + 8192] wscale 7  [700875095 + 22144] wscale 9
   age 00:00:11, expires in 23:59:52, 23:33 pkts, 4140:4909 bytes, rule 212
   id: 4ff4def100001150 creatorid: 76e4b844
```

Shut down host A, SSH connection remains active through host B. While host A is down, make a second SSH connection through the router. The new states show up on B:


```
all tcp 10.4.1.4:22 <- 10.2.1.4:49303       ESTABLISHED:ESTABLISHED
   [700875191 + 22144] wscale 9  [1344039203 + 8192] wscale 7
   age 00:03:38, expires in 23:59:39, 3:3 pkts, 204:252 bytes
   id: 4ff4def10000114f creatorid: 76e4b844
all tcp 10.2.1.4:49303 -> 10.4.1.4:22       ESTABLISHED:ESTABLISHED
   [1344039203 + 8192] wscale 7  [700875191 + 22144] wscale 9
   age 00:03:38, expires in 23:59:39, 3:3 pkts, 204:252 bytes
   id: 4ff4def100001150 creatorid: 76e4b844
all tcp 10.4.1.4:22 <- 10.2.1.4:49323       ESTABLISHED:ESTABLISHED
   [1512494642 + 22144] wscale 9  [11200697 + 8192] wscale 7
   age 00:00:08, expires in 23:59:55, 21:32 pkts, 4036:4825 bytes, anchor 183, rule 8
   id: 4ff4e7c40000025b creatorid: 648d37f6
all tcp 10.2.1.4:49323 -> 10.4.1.4:22       ESTABLISHED:ESTABLISHED
   [11200697 + 8192] wscale 7  [1512494642 + 22144] wscale 9
   age 00:00:08, expires in 23:59:55, 21:32 pkts, 4036:4825 bytes, rule 212
   id: 4ff4e7c40000025c creatorid: 648d37f6
```

Start up host A again, it becomes active but only the states for the second connection started while it was down come up on host A:


```
all tcp 10.4.1.4:22 <- 10.2.1.4:49323       ESTABLISHED:ESTABLISHED
   [1512495026 + 22144] wscale 9  [11200889 + 8192] wscale 7
   age 00:07:29, expires in 23:59:53, 2:3 pkts, 152:252 bytes
   id: 4ff4e7c40000025b creatorid: 648d37f6
all tcp 10.2.1.4:49323 -> 10.4.1.4:22       ESTABLISHED:ESTABLISHED
   [11200889 + 8192] wscale 7  [1512495026 + 22144] wscale 9
   age 00:07:29, expires in 23:59:53, 2:3 pkts, 152:252 bytes
   id: 4ff4e7c40000025c creatorid: 648d37f6
```

At this point, only the second SSH connection will work.

It seems like the bulk update process is perhaps assuming that the states from A still exist on A after the reboot, and so not include them, but because they aren't, the states never reappear on host A and the connection is lost.

Is this working as designed, or do other people have the same experience? Thanks in advance for any advice or pointers.

Following are the relevant configs:

Kernel config:


```
device          pf
device          pfsync
device          pflog
device          carp
```

Host A rc.conf:


```
ifconfig_bce0="up"
ifconfig_bce1="192.168.42.1/30"

cloned_interfaces="vlan2 vlan4"
ifconfig_vlan2="inet 10.2.0.1 netmask 255.255.0.0 vlan 2 vlandev bce0"
ifconfig_vlan4="inet 10.4.0.1 netmask 255.255.0.0 vlan 4 vlandev bce0"

ifconfig_carp2="vhid 2 pass 12345678 10.2.0.10/16"
ifconfig_carp4="vhid 4 pass 12345678 10.4.0.10/16"

pf_enable="YES"
pflog_enable="YES"
pfsync_enable="YES"
pfsync_syncdev="bce1"
pfsync_syncpeer="192.168.42.2"
```

Host B rc.conf:


```
ifconfig_bce0="up"
ifconfig_bce1="192.168.42.2/30"

cloned_interfaces="vlan2 vlan4"
ifconfig_vlan2="inet 10.2.0.2 netmask 255.255.0.0 vlan 2 vlandev bce0"
ifconfig_vlan4="inet 10.4.0.2 netmask 255.255.0.0 vlan 4 vlandev bce0"

ifconfig_carp2="vhid 2 advskew 100 pass 12345678 10.2.0.10/16"
ifconfig_carp4="vhid 4 advskew 100 pass 12345678 10.4.0.10/16"

pf_enable="YES"
pflog_enable="YES"
pfsync_enable="YES"
pfsync_syncdev="bce1"
pfsync_syncpeer="192.168.42.1"
```

pf.conf:


```
set skip on lo0
set skip on bce1
set block-policy drop

scrub in all

block log

pass quick proto carp keep state (no-sync)

pass in quick proto tcp from vlan2:network to vlan4:network port 22

pass out quick
```


----------



## glebius@ (Jul 5, 2012)

> It seems like the bulk update process is perhaps assuming that the states from A still exist on A after the reboot, and so not include them, but because they aren't, the states never reappear on host A and the connection is lost.



I don't remember such logic in pfsync.

Can you please check whether problem is reproducible on fresh stable/9? There were some bugfixes to pf/pfsync since 9.0-RELEASE.


----------



## CraigH (Jul 6, 2012)

glebius@ said:
			
		

> Can you please check whether problem is reproducible on fresh stable/9?



I'll give it a try and let you know how it goes.


----------



## CraigH (Jul 9, 2012)

I can confirm that this is now fixed on 9.0-STABLE as of this weekend.

Being a relative FreeBSD newbie, what is the procedure for the relevant patches to make their way into the RELEASE branch?


----------



## glebius@ (Jul 15, 2012)

RELEASE isn't a branch, but a point on a stable branch. You can't push changes into 9.0-RELEASE, that is already out. You could, if you had a time machine 

Since 9.0-STABLE is fixed, then 9.1-RELEASE would be fixed, too.


----------

