# PF Load Balancing Degraded Internet Connectivity



## Sabrtooth (Jul 11, 2011)

So I'm starting to Troubleshoot this now, and am putting this out there in an effort that a smarter person may find a smoking gun while I'm looking.

Here's my /etc/pf.conf:


```
#!/bin/sh
################ Macros ####################################
lan_net = "{X.X.X.0/24, 127.0.0.8/8, Y.Y.Y.0/24, Z.Z.Z.0/24}"
int_if1 = "re1"
int_if2 = "re3"
all_int = "{ re1, re3 }"
ext_if1 = "re0"
ext_if2 = "re2"
all_ext = "{ re0, re2 }"
ext_gw1 = "A.A.A.1"
ext_gw2 = "B.B.B.33"
################ Service Bundles###########################
# These tables are for quick firewall rules in and out.
# Update as necessary
# Can be used for redirectors as well.

# These can't change gateways mid session
# It is O.K. to define them implicitly in the rules.
tcp_services="{ 22, 25, 53, 10010, 70, 80, 443, 5500, 3389, 3390, 3391, 40000:40100, 42000 } keep s
icmp_types="echoreq"

################ Tables ####################################
# Referenced in Firewall Rules.
# Need to setup list files in /etc.
#table <trustedhostsstatic> const {1.1.1.1,trusted.host.domain}
table <blackliststatic> const {95.9.58.21,67.141.187.173,165.228.140.143,79.28.203.96}
table <trustedhostsdynamic> persist

################ Options ##################################
# Timeout Options
set optimization aggressive
set loginterface $ext_if1
set block-policy drop
set require-order yes

################ Normaliztation ############################
# normalize all incoming traffic. Set ttl to 254 to limit possible mapping of hosts behind firewall
# Also set random-id to help with the same. Set mss to ATM network frame size for easy splitting up
#scrub on $ext_if all random-id min-ttl 254 max-mss 1452 reassemble tcp fragment reassemble

################ Queueing ##################################
# Use a queuing to prioritize empty (no payload) TCP ACKs,
# This dramatically improves throughput on (asymmetric) links when the
# reverse direction is saturated. The empty ACKs use an insignificant
# part of the bandwidth, but if they get delayed, downloads suffer
# badly, so prioritize them.

# Example: 512/128 kbps ADSL. Download is 50 kB/s. When a concurrent
# upload saturates the uplink, download drops to 7 kB/s. With the
# priority queue below, download drops only to 48 kB/s.

# For a 512/128 kbps ADSL with PPPoE link, using "bandwidth 100Kb"
# is optimal. Some experimentation might be needed to find the best
# value. If it's set too high, the priority queue is not effective, and
# if it's set too low, the available bandwidth is not fully used.
# A good starting point would be real_uplink_bandwidth * .90
# My ISP's says bandwidth upload is 1000kbps. In reality
# its ~984kbps (123Kbps). Tested this by uploading multiple files
# to servers to try and saturate upload bandwidth.
# Note: For 768kbps up 744kbps looks like it works well for me.
# Note: For 384kbps up 368kbps looks like it works well for me.
# Note: For 384kbps up 368kbps looks like it works well for me.

#altq on $ext_if bandwidth 984Kb hfsc queue { q_pri, q_def, q_mus, q_tor }
# queue q_pri bandwidth 49%           priority 7 hfsc
# queue q_def bandwidth 49%           priority 5 hfsc (linkshare 49%) {q_smtp,q_http,ssh_login,q_def1}
#   queue ssh_login bandwidth 96%       priority 5 hfsc
#   queue q_http    bandwidth 1%        priority 4 hfsc
#   queue q_smtp    bandwidth 1%        priority 4 hfsc
#   queue q_def1    bandwidth 1%        priority 3 hfsc (default)
# queue q_mus bandwidth  1% qlimit 200  priority 4 hfsc
# queue q_tor bandwidth  1% qlimit 25   priority 3 hfsc (upperlimit 272Kb)

################ FTP Proxy #################################
#anchor "ftp-proxy/*"
#pass in quick on $int_if inet proto tcp to any port ftp \
#    rdr-to 127.0.0.1 port 8021

################ Translation ###############################
# Translation rules are first match
#  nat outgoing connections on each internet interface
nat on $ext_if1 from $lan_net to any -> A.A.A.98/29 source-hash
nat on $ext_if2 from $lan_net to any -> B.B.B.35/29 source-hash

rdr on $ext_if1 proto tcp from any to { A.A.A.98, B.B.B.34 } port 80 -> X.X.X.10 port 80
rdr on $ext_if1 proto tcp from any to { A.A.A.98, B.B.B.34 } port 443 -> X.X.X.10 port 443

rdr on $ext_if1 proto tcp from any to { A.A.A.98, B.B.B.34 } port 3389 -> X.X.X.11 port 3389

rdr on $all_ext proto tcp from any to { A.A.A.99, B.B.B.35 } port 3389 -> X.X.X.12 port 3389
rdr on $all_ext proto tcp from any to { A.A.A.99, B.B.B.35 } port 80 -> X.X.X.12 port 80
rdr on $all_ext proto tcp from any to { A.A.A.99, B.B.B.35 } port 443 -> X.X.X.12 port 443

rdr on $ext_if1 proto tcp from any to { A.A.A.100, B.B.B.36 } port 80 -> X.X.X.13 port 80
rdr on $ext_if1 proto tcp from any to { A.A.A.100, B.B.B.36 } port 70 -> X.X.X.13 port 70
rdr on $ext_if1 proto tcp from any to { A.A.A.100, B.B.B.36 } port 443 -> X.X.X.13 port 443
rdr on $ext_if1 proto tcp from any to { A.A.A.100, B.B.B.36 } port 5500 -> X.X.X.13 port 5500
rdr on $ext_if1 proto tcp from any to { A.A.A.100, B.B.B.36 } port 5901 -> X.X.X.13 port 5901
rdr on $ext_if1 proto tcp from any to { A.A.A.100, B.B.B.36 } port 40000:40100 -> X.X.X.13 port 40000:40100
rdr on $ext_if1 proto tcp from any to { A.A.A.100, B.B.B.36 } port 42000 -> X.X.X.13 port 42000
#rdr on $ext_if1 proto tcp from any to { A.A.A.100, B.B.B.36 } port 3389 -> X.X.X.13 port 3389

rdr on $ext_if1 proto tcp from any to { A.A.A.101, B.B.B.37 } port 80 -> X.X.X.14 port 80
rdr on $ext_if1 proto tcp from any to { A.A.A.101, B.B.B.37 } port 443 -> X.X.X.14 port 443
rdr on $ext_if1 proto tcp from any to { A.A.A.101, B.B.B.37 } port 22 -> X.X.X.14 port 22

rdr on $ext_if1 proto tcp from any to { A.A.A.101, B.B.B.37 } port 3390 -> Y.Y.Y.20 port 3389
rdr on $ext_if1 proto tcp from any to { A.A.A.101, B.B.B.37 } port 5900 -> Y.Y.Y.20 port 5900
rdr on $ext_if1 proto tcp from any to { A.A.A.101, B.B.B.37 } port 5901 -> Y.Y.Y.15 port 5900
rdr on $ext_if1 proto tcp from any to { A.A.A.101, B.B.B.37 } port 5901 -> Y.Y.Y.15 port 5900


################ Filtering #################################
# loopback
antispoof log quick for lo0 inet
pass quick on lo0 all

# Open for testing
#pass in all
#pass out all

# default deny
#block in
#block out

# pass all outgoing packets on internal interface
pass out on $int_if1 from any to $lan_net
pass out on $int_if2 from any to $lan_net

#TCP Services (see top)
pass in log on $ext_if1 reply-to ($ext_if1 $ext_gw1) proto tcp from any to any port $tcp_services
pass in log on $ext_if2 reply-to ($ext_if2 $ext_gw2) proto tcp from any to any port $tcp_services

pass in quick on $int_if1 from $lan_net to $int_if1
pass in quick on $int_if2 from $lan_net to $int_if2


# Choose the connection Lan Traffic will leave on.

# ISP1
#pass in quick log on $int_if1 route-to ($ext_if1 $ext_gw1) source-hash \
#proto tcp from $lan_net to any flags S/SA modulate state
#pass in quick log on $int_if1 route-to ($ext_if1 $ext_gw1) source-hash \
#proto{ udp, icmp } from $lan_net to any keep state

# ISP2
pass in quick log on $int_if1 route-to ($ext_if2 $ext_gw2) source-hash \
proto tcp from $lan_net to any flags S/SA modulate state
pass in quick log on $int_if1 route-to ($ext_if2 $ext_gw2) source-hash \
proto{ udp, icmp } from $lan_net to any keep state

pass in quick log on $int_if2 route-to ($ext_if2 $ext_gw2) source-hash \
proto tcp from $lan_net to any flags S/SA modulate state
pass in quick log on $int_if2 route-to ($ext_if2 $ext_gw2) source-hash \
proto{ udp, icmp } from $lan_net to any keep state

# General "pass out" rules for external interfaces
pass out on $ext_if1 proto tcp from any to any flags S/SA modulate state
pass out on $ext_if1 proto { udp, icmp } from any to any keep state
pass out on $ext_if2 proto tcp from any to any flags S/SA modulate state
pass out on $ext_if2 proto { udp, icmp } from any to any keep state
# Route packets from any IPs on $ext_if1 to $ext_gw1 and the same for
# $ext_if2 and $ext_gw2
pass out on $ext_if1 route-to ($ext_if2 $ext_gw2) from $ext_if2 to any
pass out on $ext_if2 route-to ($ext_if1 $ext_gw1) from $ext_if1 to any
```

It is also of importance to note that since before this started I setup load balancing DNS on all my hosts that come to this box via a second A record.

So 


```
#rdr on $ext_if1 proto tcp from any to { A.A.A.100, B.B.B.36 } port 3389 -> X.X.X.13 port 3389
```

Has 2 A records at the DNS level like:

```
Server.Freebsd.Com A.A.A.100
Server.Freebsd.Com B.B.B.36
```

The problem that started happening is that the connection will degrade. Everything is passing based on pflog and tcpdump.

I'll ping Google.com and it will randomly drop ping which will grow in consistency.

Sometimes certain computers on my X.X.X.0, Y.Y.Y.0, and Z.Z.Z.0 wont work while others on the same subnet will.

Switching outgoing ISPs does not help.

Any early suggestions would be great. I'll update as I go.


----------



## Sabrtooth (Jul 11, 2011)

Crazy Symptom

We have some mission critical apps that are .net/.html based that connect right to a server behind the firewall. When I disable pf on the firewall, everything works fine. Enabled, the applications experience random hangs. 

Interesting!


----------



## Sabrtooth (Jul 14, 2011)

How crazy is this?

Despite everything I dump showing pass, it seems that a set number (11) of computers are all that is ever allowed on. 12 and up get no connectivity outside the network. 

If I flush the state table and reload configuration, another random 11 will work, but that's it.

This is fun! If not debilitating!


----------



## Sabrtooth (Jul 14, 2011)

I should state, that every computer worked fine with ipnat, then IPFW, and PF before adding load balancing.


----------



## Sabrtooth (Jul 14, 2011)

So, I am getting closer.

Whenever I run [cmd=]pfctl -F state -f /etc/pf.conf[/cmd] it triggers a different list of computers. 

I'm wondering if there is a correlation between the fact that I have 10 external ip's and that seems to be the limit of the connections that have access. Though I can't validate that by anything like tcpdump or ipchicken.com.


```
no nat on $int_if1 from x.x.x.0/24 to y.y.y.0/24
no nat on $int_if2 from y.y.y.0/24 to x.x.x.0/24
nat on $ext_if1 from { x.x.x.0/24, y.y.y.0/24 } to any -> x.x.x.x/29 source-hash
nat on $ext_if2 from { x.x.x.0/24, y.y.y.0/24 } to any -> y.y.y.y/29 source-hash
```

Nothing in any of the other code above has changed, but whenever I flush the states, different sets of computers have access. Weird, no?

Any help?


----------



## Sabrtooth (Jul 14, 2011)

Thanks Dutch! 

No ideas?


----------



## Sabrtooth (Jul 15, 2011)

Wow, O.K.

I got it, but I'm not exactly sure why it works. I think it may be a problem with my ISPs. 


```
nat on $ext_if1 from { x.x.x.0/24, y.y.y.0/24 } to any -> A.A.A.98/32 source-hash
nat on $ext_if2 from { x.x.x.0/24, y.y.y.0/24 } to any -> B.B.B.34/32 source-hash
```

Works. All incoming connections seem to operate appropriately as well. My rules will favor outgoing connections on one or the other.

Thoughts?


----------

