# PF, NAT and connection failures



## abishai (Feb 1, 2014)

Hello.

Recently I upgraded from 9.1 to 10 and got a lot of problems. I hoped, all of them are fixed now, but this one looks tricky.
I found that several times I hit states limit of pf and rewrited jail rules as no state. I turned logging to all blocked packets to see if all running smoothly and forgot to disable it. 
Recently I found the log file rather huge.
Most of its entries referenced blocked external https transmissions (in and out). That was very strange, as outgoing https access is opened for external interface.

```
block in on re0: xxx.xxx.xxx.xxx.443 > xxx.xxx.xxx.xxx.62063: Flags [P.], seq 1460:2423, ack 1, win 32768, length 963
```
. 
I looked at application logs and found strange connectivity problems accessing https ports. I can describe them as random failures, sometime on the middle of the session.

I'm not sure this is FreeBSD 10 feature, but application logs says connectivity was OK before. I suspect wrong NATing.

1. External interface use 2 aliases.
2. ftp/curl library with keep alive used.

Here is parts of my pf config


```
tcp_out = "{ 22, 23, 80, 443, 9090, spamd, spamd-cfg }"         #Allowed outgoing ports
#NAT JAil traffic
nat pass on $ext from $jail:network to any -> $ext

pass out quick on $ext inet proto tcp from $ext:network to any port $tcp_out keep state #TCP POLICY
```

I suspect that my problem lies in `-> $ext` of NAT rule. If I understand correctly, pf will use random alias of $ext interface every times it NATs traffic and sometime IP is not in state table. pf log above shows it - connections was established with 443 ports, but reply hits another IP and triggered  block log all rule.
Am I guessed correctly?


----------



## abishai (Feb 12, 2014)

*Re: PF, NAT and SSL failures*

I still can't solve the issue. Here is what i found.
The problem looks like that pf is silently dropping state after some time. Connections work if they are fast enough. Main host is not affected.

1. Start downloading
`root@cloud:/tmp # fetch [url=http://gcc.skazkaforyou.com/releases/gcc-4.6.4/gcc-4.6.4.tar.bz2]http://gcc.skazkaforyou.com/releases/gc ... .4.tar.bz2[/url]`

2. State created

```
abi@serpent:/tmp % sudo pfctl -ss | grep 70.38.30
No ALTQ support in kernel
ALTQ related functions disabled
all tcp 5.9.156.175:62956 (192.168.0.9:61864) -> 70.38.30.201:80       ESTABLISHED:ESTABLISHED
```
3. State dropped. (Why?)
4. Fetch hangs....
5. ...and  timeouts

```
fetch: transfer timed out
fetch: gcc-4.6.4.tar.bz2 appears to be truncated: 351232/72006076 bytes
```

Sometime I see 

```
root@cloud:/tmp # fetch http://gcc.skazkaforyou.com/releases/gcc-4.6.4/gcc-4.6.4.tar.bz2
gcc-4.6.4.tar.bz2                               0% of   68 MB  140 kBps 06m52s
fetch: http://gcc.skazkaforyou.com/releases/gcc-4.6.4/gcc-4.6.4.tar.bz2: Connection reset by peer
```

tcpdump on re0 shows no request to end connection.

My ifconfig:

```
re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
        ether d4:3d:7e:da:ff:c2
        inet xx.xx.xxx.xxx netmask 0xffffff00 broadcast xx.xx.xxx.xxx
        inet xx.xx.xxx.xxx netmask 0xffffffff broadcast xx.xx.xxx.xxx
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 127.0.0.1 netmask 0xff000000
lo1: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 192.168.0.1 netmask 0xffffff00
        inet 192.168.0.2 netmask 0xffffffff
        inet 192.168.0.3 netmask 0xffffffff
        inet 192.168.0.4 netmask 0xffffffff
        inet 192.168.0.5 netmask 0xffffffff
        inet 192.168.0.6 netmask 0xffffffff
        inet 192.168.0.7 netmask 0xffffffff
        inet 192.168.0.8 netmask 0xffffffff
        inet 192.168.0.9 netmask 0xffffffff
tap0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=80000<LINKSTATE>
        ether 00:bd:ef:0e:00:00
        inet 192.168.1.222 netmask 0xffffff00 broadcast 192.168.1.255
        media: Ethernet autoselect
        status: active
        Opened by PID 1127
pflog0: flags=141<UP,RUNNING,PROMISC> metric 0 mtu 33160
```

rc.conf part

```
cloned_interfaces="lo1 tap0" #Jail, OpenVPN
ifconfig_lo1="inet 192.168.0.1 netmask 255.255.255.0 up"
ifconfig_lo1_alias0="inet 192.168.0.2 netmask 255.255.255.255"
ifconfig_lo1_alias1="inet 192.168.0.3 netmask 255.255.255.255"
ifconfig_lo1_alias2="inet 192.168.0.4 netmask 255.255.255.255"
ifconfig_lo1_alias3="inet 192.168.0.5 netmask 255.255.255.255"
ifconfig_lo1_alias4="inet 192.168.0.6 netmask 255.255.255.255"
ifconfig_lo1_alias5="inet 192.168.0.7 netmask 255.255.255.255"
ifconfig_lo1_alias6="inet 192.168.0.8 netmask 255.255.255.255"
ifconfig_lo1_alias7="inet 192.168.0.9 netmask 255.255.255.255"
```

I really have no idea thats going on


----------



## CoTones (Feb 13, 2014)

*Re: PF, NAT and SSL failures*

On OpenBSD I had problems with re interface - packets were dropped on higher load.


----------



## abishai (Feb 13, 2014)

*Re: PF, NAT and SSL failures*

The issue exists only for jails, looks like pf behavior. Shall post my pf.conf, maybe something wrong with rules.


----------



## SirDice (Feb 13, 2014)

abishai said:
			
		

> Most of its entries referenced blocked external https transmissions (in and out). That was very strange, as outgoing https access is opened for external interface.
> 
> ```
> block in on re0: xxx.xxx.xxx.xxx.443 > xxx.xxx.xxx.xxx.62063: Flags [P.], seq 1460:2423, ack 1, win 32768, length 963
> ...


This looks like a response to an _incoming_ HTTPS connection, not an _outgoing_.


----------



## abishai (Feb 13, 2014)

SirDice said:
			
		

> This looks like a response to an _incoming_ HTTPS connection, not an _outgoing_.


How would you know? By PUSH flag ? It can be existing connection, dropped due to state disappearance.

I made a breakthrough today.

```
#NAT JAil traffic
nat pass on $ext from $lamia to any -> $killboard
nat pass on $ext from $jail:network to any -> $main
```
After I removed _pass_ keyword my problem instantly gone. I am reading mans, but yet have no clue how it could help.


----------



## SirDice (Feb 13, 2014)

abishai said:
			
		

> SirDice said:
> 
> 
> 
> ...


It's not logical to have an outgoing connection to some random high port and have port 443 as a source port. The other way around is a lot more common. 



> I made a breakthrough today.
> 
> ```
> #NAT JAil traffic
> ...


If I remember correctly it "automagically" adds the correct pass rules to allow the traffic. But those may interfere with more specific rules you have set yourself.


----------



## wunki (Feb 15, 2014)

I have the same problem. Since moving to FreeBSD 10.0, packets routed through PF and NAT are being dropped randomly. After a reboot, all is fine for a while.

My pf.conf is very simple:


```
# Interfaces
int_if = "lagg0"
ext_if = "lagg1"

# Jail
jail_net = "10.70.210.48/28"

# Clean every packet
scrub in all

# NAT
nat on $ext_if from $jail_net to any -> $ext_if
```

Could it be that a bug is introduced in 10.0? I found the following: http://www.freebsd.org/cgi/query-pr.cgi?pr=185876


----------



## _martin (Feb 20, 2014)

Hm, this sounds very similar to my problem: http://forums.freebsd.org/viewtopic.php?f=7&t=44953. I had to downgrade back, I was not able to keep the connection open for too long (even with a live traffic). 

In my setup PF is used to NAT the VPN traffic.


----------



## SirDice (Feb 20, 2014)

wunki said:
			
		

> ```
> nat on $ext_if from $jail_net to any -> $ext_if
> ```



I don't think it's going to help much but it's better to use this:

```
nat on $ext_if from $jail_net to any -> ($ext_if)
```

You also seem to be missing this:

```
set skip on lo0
```



> Could it be that a bug is introduced in 10.0?


Possible. PF has been changed quite a lot for 10.0 to make it run better on SMP systems. So it's possible some bugs were introduced because of this.


----------



## bsd4masses (Mar 3, 2014)

Hello,

I also have some problems regarding this topic.  I noticed reboot corrected behaviour, but after days I got again and again

```
sshd[36768]: fatal: Write failed: Operation not permitted
```

Hopefully someone fixed or fixes that problem.

Thanks, Norbert


----------



## _martin (Mar 5, 2014)

bsd4masses said:
			
		

> Hopefully someone fixed or fixes that problem.


As far as I can tell the issue still persists. 

I chuckle a bit always when I see the 9.2 FreeBSD as "legacy" on FreeBSD homepage. It seems it will be the production release for some time.


----------



## abishai (Mar 13, 2014)

As for my problem - mine was an effect of state disappearance. You can test it to fetch something from jail and monitor connection with `tcpdump`. (You may have to find big file or slow server if you've got fast connection). If `fetch` halts and firewall begin to block your existing connection this means state was dropped from state table of pf. I found no clue, why pass option in nat has such disaster behavior, maybe pf indeed has some deep flaw in 10-RELEASE, you can try to rewrite your rules with no state and check changes.


----------



## wunki (Mar 13, 2014)

abishai said:
			
		

> As for my problem - mine was an effect of state disappearance. You can test it to fetch something from jail and monitor connection with `tcpdump`. (You may have to find big file or slow server if you've got fast connection). If `fetch` halts and firewall begin to block your existing connection this means state was dropped from state table of pf. I found no clue, why pass option in nat has such disaster behavior, maybe pf indeed has some deep flaw in 10-RELEASE, you can try to rewrite your rules with no state and check changes.



Could you share your NAT configuration in pf.conf? Would love to try it out.


----------



## abishai (Mar 17, 2014)

wunki said:
			
		

> Could you share your NAT configuration in pf.conf? Would love to try it out.


Sorry for late reply. I hope it helps, but I think it's rather common setup.


```
ext="re0"
int="lo0" #loopback
jail="lo1" #jail pseudo interface

main=xxx"              #Root server real IP
email="xxx"         #email services real IP

#global options
set block-policy drop
set state-policy if-bound
set fingerprints "/etc/pf.os"
set loginterface $ext
set skip on $vpn
set timeout { tcp.closing 60, tcp.established 7200}

#normaliser
scrub in on $ext fragment reassemble random-id no-df

#NAT JAil traffic
nat on $ext from $lamia to any -> $email
nat on $ext from $jail:network to any -> $main
```
As you can see, I have 2 aliases on re0 and need to specify exact IP address to make email servers happy with me. I have daemons split into 6 jails and all communications are written by hand with no state option as I block default inbound/outbound. Whole pf.conf is very big due to it, but I made it to learn how to write rules. 
My outgoing jail setup breaks immediately if I add pass option to nat rules with no visible reason after ~5 sec of established connection.
You can try to `tcpdump` on external interface, block log all and `pfctl -ss` to shed light on you issue.

An idea: jails can generate a lot of states and overflow state table. You may want to check pf stats with `pfctl -si` If I remember correctly, default limit is 10k active states.


----------



## wunki (Mar 20, 2014)

Thanks! I have switched back to 9.2 for now and have no problems there. Will try this on a test box I have with 10.0. Thanks again.


----------



## bsd4masses (Apr 4, 2014)

bsd4masses said:
			
		

> Hello,
> 
> I also have some problems regarding this topic.  I noticed reboot corrected behaviour, but after days I got again and again
> 
> ...



So, I took another way: instead of using ports to "forward" into different jails "over pf.conf", I use network alias IPs.  There is no need of pf.conf anymore.  That is stable - and in fact, it is a clean solution. 

Norbert


----------



## bthomson (Feb 2, 2015)

Saw something similar to this today on a NAT router running 10.1-RELEASE, and wanted to leave a note in case anyone encounters a similar problem or learns something from this debugging info.

To test I was using this:


```
% fetch -o /dev/null http://ftp1.fr.freebsd.org/pub/FreeBSD/ISO-IMAGES-amd64/10.1/FreeBSD-10.1-RELEASE-amd64-bootonly.iso
/dev/null 12% of 218 MB 763 kBps 04m15s
fetch: http://ftp1.fr.freebsd.org/pub/FreeBSD/ISO-IMAGES-amd64/10.1/FreeBSD-10.1-RELEASE-amd64-bootonly.iso: Connection reset by peer
```
As you can see the connection is reset partway through. With this configuration a connection reset is inevitable for all connections that last more than a few seconds. The amount of data is irrelevant, only the duration of the connection.

The "connection reset by peer" turned out to be coming from pf on this intermediate router, not the remote host or the machine doing the fetching. I verified this by adjusting `block-policy` from `return` to `drop` in pf.conf on this router: that caused the connection to hang instead of being reset.

When I restarted the fetch and looked at the state table on that router while it was running, I saw this:


```
% pfctl -ss | grep '88.191.250'
re0 tcp 88.191.250.131:80 <- 192.168.1.60:25636   CLOSED:SYN_SENT
re0 tcp 192.168.1.60:25636 -> 88.191.250.131:80    SYN_SENT:CLOSED
```
This is while there's data flowing through the pipe, before the connection is reset! So pf thinks the connection is closed shortly after it is established, but there's still data flowing. The fetch continues to download normally and does not receive a connection reset until the states above have disappeared from the table, after 30 seconds or so (some kind of timeout, surely?). What seems to have happened is that pf on the router dropped the state, and then fetch sent a ACK packet to the router, and then pf on the router sent back a TCP RST rather than forwarding the ACK packet to 88.191.250.131. And that is why we get "connection reset by peer".

In contrast to the weirdness above, the states on the machine doing the fetching were as expected:


```
% pfctl -ss | grep '88.191.250'
all tcp 192.168.1.60:18139 -> 88.191.250.131:80   ESTABLISHED:ESTABLISHED
```
I don't understand what's going on here or if it's related to the OP's problem, but from my limited knowledge of networking it seems odd. Surely all instances of pf along the line should agree about the state of the connection, no? And what might cause pf on this router to erroneously conclude that a party has closed the connection?

Anyway, for myself I've found a suitable workaround, but perhaps the details above will help anyone encountering similar issues to find and understand the problem.


----------



## _martin (Feb 3, 2015)

bthomson Correct. This is the issue I had with the PF I described in the thread I posted above. I even created other one here: Thread 47532. And I did create one on MPD5 forums too.
Bottom line was that PF started to send RST for no apparent reason and cutting off the connection.
I did have some problems creating this behavior on other machines though. It seems certain (unknown to me) conditions have to be met for this issue to occur. I was unable to replicate it in VM (virtualbox/vmware), different physical HW. My issue occured on S1200BT motherboard.

I found out that my issue roots from the *rdr pass* I have in my rules. My current workaround is to split *rdr pass* into two sets of rules: rdr and then later allow traffic in filtering section of pf.conf; as shown in the thread I mentioned here.

I spoke with the PF guy @ FreeBSD but he showed no interest in this problem. I did ask on mailing list but no response either.

I love FreeBSD but this experience did cause me not to donate last year .. not that it makes any difference, I know.


----------



## bthomson (Feb 5, 2015)

Thanks for sharing these details, matoatlantis. I can relate to your frustration there because I have also sometimes been ignored when I try to bring a possible bug to the developers' attention. But, at least we are able to provide some documentation here to help those who might try to track down or fix this issue in the future.

For my part, I don't really want to paste my entire pf.conf here, but it does not contain any *rdr pass* rules. It does contain *nat* rules however, which might be the source of the problem.

I will share my workaround. I added the line:


```
nat on re0 from 192.168.1.60 to any -> 192.168.1.105
```
to the router's pf.conf. (`192.168.1.60` was the machine experiencing the problem with fetch and `192.168.1.105` is a static IP on the router) NAT should not be necessary here since all hosts involved are on the same subnet, but I'm glad it fixed the problem.


----------



## zspider (Feb 5, 2015)

bthomson said:


> Thanks for sharing these details, matoatlantis. I can relate to your frustration there because I have also sometimes been ignored when I try to bring a possible bug to the developers' attention.



Yeah, I know the feeling, the one time I reported an issue, I heard nothing about it, but to be fair it was fixed by the next update.


----------



## getopt (Feb 5, 2015)

kpa said:


> You wonder why I'm thinking of moving to OpenBSD on my firewall system?


Regarding Packet Filter you have the choices:
1. Want a fast Packet Filter? Take FreeBSD's PF.
2. Want a a more powerful and secure Packet Filter? There is only one PF left. 

The maintainers/developers should settle their dispute very soon. The PF agenda is now far too long stuck. My suggestion would be to offer the OpenBSD version of Packet Filter as a port so that FreeBSD users could choose what meets their needs without being forced to leave FreeBSD.


----------



## _martin (Feb 5, 2015)

bthomson said:


> It does contain *nat* rules however, which might be the source of the problem.


When I was struggling to find the source of my issue it was either nat or rdr. Trial-error approach showed that rdr was an issue, nat was ok. That's why I use nat pass but do have rdr only and pass it afterwards. 

Strange workaround you have, but I understand.


----------

