# FreeBSD Router Stops Routing



## mtrower (Nov 1, 2014)

I use a FreeBSD box as a primary router/gateway (henceforth referred to as "the router").  It directly receives my WAN IP from the cable modem.  When I originally set this up several months ago, everything worked fine (and had for a much longer time before that running Linux instead).  Lately, however, the system has been problematic.

I'm receiving a lot of this when I try to access remote hosts on my client machines:

```
PING blackshard.net (96.126.121.106) 56(84) bytes of data.
From 10.0.0.1 icmp_seq=1 Destination Host Unreachable
From 10.0.0.1 icmp_seq=2 Destination Host Unreachable
From 10.0.0.1 icmp_seq=3 Destination Host Unreachable
From 10.0.0.1 icmp_seq=4 Destination Host Unreachable
```


It's not just that one host, but any that I try to connect to.  The router itself is able to connect without issues, however.  It can always initiate outbound connections and receive inbound connections on both the WAN and LAN.  I have reason to suspect routing trouble in the opposite direction (WAN to LAN) as well, but I cannot yet say so with certainty.  In any event, I can ssh into the router from a client machine and then proceed to ping out - for whatever reason, it appears to just stop routing packets.

The problem is intermittent, with small windows of availability (one connection, or possibly several).  In all cases, existing connections continue to work fine - only new connections fail.  If I reboot the router, it clears up and works fine for a short period.  After 10 minutes or so, it begins failing again.  This problem may have started small and grown worse over time, but there have been multiple factors involved so I cannot say this with certainty either.

Checking the usual logs, I see nothing of interest.  /var/log/auth.log revealed a high degree of attempted sshd break-in activity, but dropping that port didn't help.  Since I don't know what might prove relevant, and don't want to dump an entire machine worth of logs and terminal output in the thread right off the bat, I'll begin with some generic information.  Please ask if you wish to see anything else.



```
% uname -a
FreeBSD gateway 10.0-RELEASE FreeBSD 10.0-RELEASE #0 r260789: Thu Jan 16 22:34:59 UTC 2014     root@snap.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
```

/etc/rc.conf

```
hostname="gateway"
ifconfig_dc0="DHCP"
ifconfig_re0="inet 10.0.0.1 netmask 255.255.0.0"

gateway_enable="YES"
ipnat_enable="YES"

dhcpd_enable="YES"
ddclient_enable="YES"

sshd_enable="YES"
ntpd_enable="YES"
powerd_enable="YES"
# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="AUTO"
zfs_enable="YES"

cupsd_enable="YES"
devfs_system_ruleset="system"
```


----------



## woodsb02 (Nov 2, 2014)

Your clients are resolving the IP address of blackshard.net as its external IP. I see you are using DHCP for the external IP and using ddclient to update the DNS records. When I ping blackshard.net it also goes to
96.126.121.106 and works, so that seems to be working ok.

But your gateway is send back the message that 
96.126.121.106 is not reachable. That says to me the gateway does not know that IT is 
96.126.121.106. Seems like the routing table needs updating with a line for that IP being itself. You can refer to https://www.freebsd.org/doc/handbook/network-routing.html for help.

As a work around, you could update your clients hosts files with blackshard.net with its internal IP address (or do this automatically using your DHCP server)?


----------



## woodsb02 (Nov 2, 2014)

What is the output of `netstat -r` on your gateway?


----------



## mtrower (Nov 2, 2014)

```
mtrower@gateway:~ % netstat -r
Routing tables

Internet:
Destination        Gateway            Flags    Refs      Use  Netif Expire
default            75-134-20-1.dhcp.m UGS         0   422151    dc0
10.0.0.0           link#2             U           0   568647    re0
10.0.0.1           link#2             UHS         0        0    lo0
75.134.20.0/22     link#1             U           0        0    dc0
75-134-20-183.dhcp link#1             UHS         0        0    lo0
localhost          link#3             UH          0        0    lo0

Internet6:
Destination        Gateway            Flags      Netif Expire
::                 localhost          UGRS        lo0
localhost          link#3             UH          lo0
::ffff:0.0.0.0     localhost          UGRS        lo0
fe80::             localhost          UGRS        lo0
fe80::%re0         link#2             U           re0
fe80::76d4:35ff:fe link#2             UHS         lo0
fe80::%lo0         link#3             U           lo0
fe80::1%lo0        link#3             UHS         lo0
ff01::%re0         fe80::76d4:35ff:fe U           re0
ff01::%lo0         localhost          U           lo0
ff02::             localhost          UGRS        lo0
ff02::%re0         fe80::76d4:35ff:fe U           re0
ff02::%lo0         localhost          U           lo0
```

The thing is, I can ping blackshard.net just fine from my gateway.  When I am experiencing trouble, no WAN hosts are pingable - including such hosts as www.google.com and 8.8.8.8.  They are always accessible from the gateway itself, however.

On the other hand, sometimes the system works just fine.  It worked fine for many hours today, but now it is malfunctioning again.  I don't understand the erraticism.


----------



## woodsb02 (Nov 2, 2014)

Oh ok, sorry I initially misunderstood, and thought blackshard.net was your router, but I now understand it is a remote site on the internet, which your router can reach directly, but is intermittently unable to forward packets there.

Analysing the specific message output from your ping, it is a message received back from your router (10.0.0.1) saying that it believed itself to be the router that was directly connected to the network on which blackshard.net destination was configured. More on the specific "host unreachable" error here:
http://www.wildpackets.com/resources/compendium/tcp_ip/unreachable#host_unreachable

This is obviously an error on your router's behalf, as the destination 96.126.121.106 is on a totally different subnet to all of the IP addresses on your router, and instead of trying to communicate with the destination directly it should be passing the message on to the next router (in this case your default route 75.134.20.1.

A quick internet search of this issue suggests it might be good to confirm your IP address and MAC address are not duplicated on your network. Also, it could be that your traffic is blacklisted for a period of time (for example by fail2ban blocking tool).
http://unix.stackexchange.com/questions/35313/host-unreachable-and-i-do-not-get-why

I think a good way to go would be to start a tcpdump(1) on your router as you then try the ping again. It would be interesting to see if your router is putting out an ARP request on either of its interfaces to try and talk to the destination host directly (which would obviously fail). This would also show if the router was passing the message on to its default router at your ISP.


----------



## mtrower (Nov 3, 2014)

http://unix.stackexchange.com/questions/35313/host-unreachable-and-i-do-not-get-why
First, I must point out that the problem that person is having is almost completely different from mine.

Him: The problem is isolated to his workstation.  Other clients on his network do not have his issue.
Me: All clients have the issue

Him: The problem is with one specific host.  Other hosts work fine.
Me: The problem is with all hosts.  blackshard.net is only an example.

Him: Receives a "Destination Host Unreachable" from his workstation.
Me: Receiving a "Destination Host Unreachable" from my router.

With all of that in mind...
**********************
fail2ban: This cannot be the problem for the following reasons:

blackshard.net is owned and operated by me, personally (it's remote from the client, though).  I don't even run fail2ban on that server.  I don't mention this in the original post, because:
The problem is not solely with blackshard.net, but with all hosts - including such hosts as www.google.com and www.freebsd.org.
Most importantly, all hosts are pingable from the router itself - fail2ban would block all connections from my WAN IP address, which includes my router.


My IP address is not being duplicated on the network.  This was not a bad thought - if 10.0.0.1 was duplicated, it might indeed cause this issue.  However, this is not the case:

```
mtrower@gateway:/usr/ports/net/arping % sudo arping -D -I re0 -c 2 10.0.0.1
..      100% packet loss (0 extra)
```
(This is a positive response, meaning the IP is unique.  The several clients I tested yielded similar results.)


Here is the ping command (from a FreeBSD client this time, thus the differing format):

```
[media@THOR ~]$ ping blackshard.net
PING blackshard.net (96.126.121.106): 56 data bytes
36 bytes from 10.0.0.1: Destination Host Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
4  5  00 0054 1867   0 0000  40  01 5656 10.0.40.4  96.126.121.106

36 bytes from 10.0.0.1: Destination Host Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
4  5  00 0054 18a5   0 0000  40  01 5618 10.0.40.4  96.126.121.106
```

Here is the associated output of `sudo tcpdump > tcpdump.log` on my router: http://pastebin.com/cmt4RZgD

That's rather long; you're probably interested in the following lines:

```
mtrower@gateway:~ % grep blackshard tcpdump.log
22:40:13.143913 IP 75-134-20-183.dhcp.mdsn.wi.charter.com.43563 > google-public-dns-a.google.com.domain: 23390+ A? blackshard.net. (32)
mtrower@gateway:~ %
mtrower@gateway:~ % grep 96.126.121.106 tcpdump.log
22:40:13.187518 IP google-public-dns-a.google.com.domain > 75-134-20-183.dhcp.mdsn.wi.charter.com.43563: 23390 1/0/0 A 96.126.121.106 (48)
```

I see no ARP request, nor a pass query for that IP.


Just for emphasis:

```
mtrower@gateway:~ % ping -c 4 blackshard.net
PING blackshard.net (96.126.121.106): 56 data bytes
64 bytes from 96.126.121.106: icmp_seq=0 ttl=46 time=52.585 ms
64 bytes from 96.126.121.106: icmp_seq=1 ttl=46 time=79.644 ms
64 bytes from 96.126.121.106: icmp_seq=2 ttl=46 time=54.988 ms
64 bytes from 96.126.121.106: icmp_seq=3 ttl=46 time=53.251 ms

--- blackshard.net ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 52.585/60.117/79.644/11.308 ms
mtrower@gateway:~ %
```
The router knows darn well how to get to blackshard.net.  It just stops sending anyone else there (or to anywhere else on the WAN).


----------



## mamalos (Nov 7, 2014)

Are you running any firewalls on your machine?

What is the output of `sysctl net.inet.ip.forwarding`?

On your `netstat -r` command I don't see a subnet mask for subnet 10.0.0.0, which -to me- seems weird. What is your subnet mask (`ifconfig re0`) and what is the subnet mask you're sending to your clients through DHCP (run `ifconfig` on the client)?


----------



## woodsb02 (Nov 7, 2014)

mamalos said:


> On your  netstat -r command I don't see a subnet mask for subnet 10.0.0.0, which -to me- seems weird. What is your subnet mask ( ifconfig re0) and what is the subnet mask you're sending to your clients through DHCP (run  ifconfig on the client)?



I believe this is normal output from `netstat -r`.
mtrower: Can you please post output of `netstat -rn` to confirm?


----------



## mtrower (Nov 8, 2014)

No firewalls are involved.

`net.inet.ip.forwarding` is 1 (and if it wasn't, packet forwarding would never work at all to begin with).

The subnet mask is 255.255.0.0 (10.0.0.0/16).  `netstat -r` does not list subnet masks; you're thinking of `netstat -rn`.  While I have a dhcp server enabled for various WiFi devices, all of the clients in question are statically configured, and on the appropriate subnet.

I currently have the router booted into OpenIndiana.  I do not experience any problems while running that.  Therefore, we should be able to rule out external factors and the possibility (or at least likelihood) of faulty hardware.  I believe either a bug or faulty configuration is to blame; I'm not so sure about the latter due to the erratic nature of the situation, but it's still a possibility.

Later, when I can afford to be without a reliable internet connection for awhile, I'll boot into FreeBSD and confirm what I have posted above.  However, I doubt that any of it has changed since the last time I checked it.  Is there anything else you would like me to report on while I am there?


----------



## gkontos (Nov 8, 2014)

mtrower said:


> Here is the ping command (from a FreeBSD client this time, thus the differing format):
> 
> ```
> [media@THOR ~]$ ping blackshard.net
> ...



For some reason I believe that NAT stops working the moment you experience the problems.

Can you run `# ipnat -s` and `# ipnat -l` when the problem begins again?

Also, can you post your /etc/ipnat.rules ?


----------



## mtrower (Nov 22, 2014)

Apologies for my lack of reply on this matter. I'd been leaving the situation alone, as OpenIndiana was working fine and I couldn't afford the luxury of toying around with a working system. That said, I really would prefer to run FreeBSD on the router, and so I decided to revisit the issue.

It appears to be working fine now.  I don't know why; I didn't change a thing (well okay, I removed a disk from the zpool mirror set to install OI, but come now - that shouldn't be related). Still, for several days the trouble has not reappeared.

I can't really consider this situation solved, because I didn't do anything demonstrable to resolve it. Still, if the problem does not return, then I guess there isn't really a problem to solve. My thanks to everyone who tried to help.

PS: gkontos, I tend to agree with you. It certainly did behave as though ipnat would just give up from time to time.  I've no idea why that would be, though. I'll come back with the info you asked for if I start having problems again.


----------



## mtrower (Dec 8, 2014)

Meh.  That didn't last long.

Quality of service has steadily degraded again, starting with small problems right before Thanksgiving and getting slowly worse.  It is now just as bad as when I originally posted.

`# ipnat -s` http://pastebin.com/UfkEWRT7
`# ipnat -l` http://pastebin.com/jA5RP1zy
/etc/ipnat.rules http://pastebin.com/zjL7xSXg

of particular interest:

```
# ipnat -s | grep fail
0   proxy create fail in
0   proxy fail in
0   decap fail in
0   icmp rebuild failures in
0   IFP address fetch failures in
0   NAT insert failures in
0   new ifpaddr failed in
0   memory requests failed in
2662   finalised failed in
0   proxy create fail out
0   proxy fail out
0   decap fail out
0   icmp rebuild failures out
0   IFP address fetch failures out
0   NAT insert failures out
53   new ifpaddr failed out
0   memory requests failed out
27852   finalised failed out
0   log failures
0   hostmap fails
0   log fail
```
Obviously, ipnat is failing periodically - but why?


----------



## John Ballesteros (Dec 10, 2014)

Hello there,

I am having the same problem on my firewall. Running FreeBSD 10.1 64 bits with kernel modifications to support ALTQ (I know it is not necessary with IPFilter, but it is already modified). I need to restart the whole system every day at night, so the people at my office don't start complaining.

I am not pretty sure if it is only a problem related to NAT. I have tried restarting IPNAT but the problem persists. The firewall only works fine until I restart the machine.

Any help would be appreciated.


----------



## swills@ (Dec 10, 2014)

John Ballesteros said:


> Hello there,
> 
> I am having the same problem on my firewall. Running FreeBSD 10.1 64 bits with kernel modifications to support ALTQ (I know it is not necessary with IPFilter, but it is already modified). I need to restart the whole system every day at night, so the people at my office don't start complaining.
> 
> ...



You should probably start a new thread and share some of the same debugging info there. It may or may not be the exact same issue.


----------



## gkontos (Dec 10, 2014)

mtrower said:


> Meh.  That didn't last long.
> Obviously, ipnat is failing periodically - but why?



Hi, I just saw it. My guess is that you are reaching the limit! I did a bit of googling and found this. So, maybe something like 
	
	



```
ipfilter_flags="-D -T ipf_nattable_sz=10009,ipf_nattable_max=300000 -E"
```
 could do the trick!

EDIT: Unfortunately, the default firewall settings regarding connections on many firewall implementations are pretty low in FreeBSD. For example in PF I always need to raise the state limit.


----------



## John Ballesteros (Dec 10, 2014)

I would do what gkontos posted, if it doesn't work I will do what swills@ suggests. I will let u you know how it goes in my case.

Anyway, where is the documentation about the ipfilter_flags? I tried on the Free_BSD_ docs page and found nothing.


----------



## MeesterWood (Dec 10, 2014)

Something working one minute and then not the next always triggers the 'hardware failure' alarm in my brain. Run a long memtest. Maybe swap out NICs?


----------



## John Ballesteros (Dec 10, 2014)

MeesterWood, that is something we are also considering. We will try first the flag suggestion. Then we will see if we can test on another hardware in case the flag doesn't work.


----------



## John Ballesteros (Dec 30, 2014)

I had no luck with the flags.   The system stops forwarding packages after a day of operation.  I also changed the hardware and the situation is the same.

I am seriously thinking about installing some other operating system.  I can't believe that this configuration is being so unstable, specially when I installed FreeBSD because its good reputation.

Any other ideas?

Thanks in advance.


----------



## wblock@ (Dec 30, 2014)

I'm very confused by this thread, which actually appears to be two different threads.  Why use ipnat(8) rather than the built-in NAT in pf(4) or ipfw(8)?


----------

