# FreeBSD 9.0 default gateway changes unexpectedly



## TretUliy2 (Dec 3, 2012)

Hello fellows.

I have a problem with my FreeBSD 9.0 router. It works fine until some day the default gateway changes on some random IP address. It happens, approximately, one time per 14 days.
I was trying to catch the application that may cause this using:
`# route -n monitor`
but have no luck with it (no events registered while default gateway changes).

```
net.inet.icmp.drop_redirect: 1
```
There are no routing daemons on this server.
I wondering maybe something (DUMMYNET, pf) in kernel can overlap memory that stores default gateway.

Some Information about my system: 
`$ uname -a`

```
FreeBSD bras-2 9.0-RELEASE FreeBSD 9.0-RELEASE #1: Tue Feb 28 10:50:04 EET 2012     root@bras:/usr/obj/usr/src/sys/BRAS  amd64
```
There are two interfaces lagg0 - LAN and lagg1 - WAN, each of them is Intel Pro 1000 ET Dual Port Server Adapter.
Both of lagg interfaces are parents for few vlan interfaces.

The main function of that box is routing, NAT and shaping clients traffic. The traffic is amount 600 Mbit/s through one interface (I mean lagg).
NAT - pf
shaping - DUMMYNET

`# ipfw list`

```
00100 allow ip from any to any via lo0
00200 deny ip from 127.0.0.0/8 to any
00300 deny ip from any to 127.0.0.0/8
01000 allow udp from any 68 to any dst-port 67 in via vlan*
01100 deny log icmp from any to any icmptypes 5,9,10
09000 allow ip from any to 255.255.255.255 dst-port 67 in via vlan*
10000 allow ip from table(20) to table(10) in recv vlan*
10100 allow ip from table(10) to table(20) out xmit vlan*
10200 allow ip from table(20,0) to any in recv vlan*
10300 allow ip from any to table(20,0) out xmit vlan*
40000 pipe tablearg ip from any to table(20) out xmit vlan*
40100 pipe tablearg ip from table(21) to any in recv vlan*
40800 allow ip from table(20) to any out xmit comstar_w
40900 allow ip from any to table(20) in recv comstar_w
50000 allow ip from me to any
50005 allow tcp from any to me established
50010 allow tcp from any to me dst-port 125,53,83,84 setup
50020 allow udp from any to me dst-port 53,161
50030 allow icmp from any to me icmptypes 0,8
```
I found a dead thread in mailing-list http://lists.freebsd.org/pipermail/freebsd-net/2012-March/031879.html so it might be that I am not the only one who hits into this issue. 

I am not familiar with kernel debugging, maybe some one can tell me how to trace memory writes to area, where default routing lies.
Any help appreciated.
Thanks.


----------



## mamalos (Dec 4, 2012)

Excuse me for asking this, but since you're not mentioning it at all, I have to ask: are you running DHCP on any of your interfaces?

If not, are you running any other services that may have been compromised? Have you checked the IP's that your default route has been set to? Have you tried rebuilding your machine from scratch and see if it persists?

Generally, what are the contents of your /etc/rc.conf with respect to network configuration?


----------



## Beeblebrox (Dec 4, 2012)

Look for a duplicate DHCP source. There is most likely either a second and interfering DHCP server or it could be due to problematic subnet definition somewhere.


----------



## TretUliy2 (Dec 4, 2012)

Thanks for your replay. No I am not using DHCP for interface configuration, however it runs DHCP Server, and a very few other services.
here is my rc.conf

```
cloned_interfaces="lagg0 lagg1 vlan3400 vlan18 vlan97 vlan84 vlan20 vlan21 vlan22 vlan23 vlan25 vlan26 vlan27 vlan28 vlan29 vlan30 vlan31 vlan32 vlan33 vlan34 vlan35 vlan36 vlan37 vlan38 vlan39 vlan40 vlan42 vlan43 vlan44 vlan49 vlan53 vlan54 vlan55 vlan56 vlan57 vlan58 vlan59 vlan60"

ifconfig_igb0="-tso -rxcsum -txcsum up"
ifconfig_igb1="-tso -rxcsum -txcsum up"
ifconfig_lagg0="laggproto lacp laggport igb0 laggport igb1 up"
ifconfig_igb2="-tso -rxcsum -txcsum up"
ifconfig_igb3="-tso -rxcsum -txcsum up"
ifconfig_lagg1="laggproto lacp laggport igb2 laggport igb3 up"

ifconfig_vlan3400_name="comstar_w"
ifconfig_comstar_w="inet x.x.y.y/29 vlan 3400 vlandev lagg0"  # Primary address
ifconfig_comstar_w_alias0="inet x.x.x.x/29"  # Secondary address

defaultrouter="y.y.y.y"
gateway_enable="YES"
firewall_enable="YES"
firewall_script="/etc/ipfw.conf"
# Client interfaces
ifconfig_vlan97="inet x.x.x.x netmask 255.128.0.0 vlan 97 vlandev lagg1"
ifconfig_vlan84="inet x.x.x.x netmask 255.255.255.0 vlan 84 vlandev lagg1"
and so on for each vlan...

dhcpd_enable="YES"
dhcpd_ifaces="vlan97 vlan84 vlan20 vlan21 vlan22 vlan23 vlan25 vlan26 vlan27 vlan28 vlan29 vlan30 vlan31 vlan32 vlan33 vlan34 vlan35 vlan36 vlan37 vlan38 vlan39 vlan40 vlan42 vlan43 vlan44 vlan49 vlan53 vlan54 vlan55 vlan56 vlan57 vlan58 vlan59 vlan60"

ntpd_enable="YES"
ntpd_program="/usr/sbin/ntpd"
ntpd_flags="-p /var/run/ntpd.pid -f /var/db/ntpd.drift"
```
There is no routing software on this box. And no events shows up in `$ route -n monitor` when default router changes. I found a two other guys who have the same problem. It was discussed in freebsd-net mailing list before http://readlist.com/lists/freebsd.org/freebsd-net/5/27379.html with no results.

Thanks.


----------



## deejay2 (Feb 21, 2013)

Hi,

I have the very same problem. Not running DHCP client or server at all. 

I have observed this issue on all my servers (different hardware). From what I can conclude, the common points are these:

Simultaneous use of:

- PF for NAT
- IPFW dummynet (pipes) for traffic shaping

I have two differente servers where it happens daily sometimes as much as 10 times in 6 minutes. I made a script that replaces the gateway to proper value by checking it every 5 seconds. It seems to come in batch, many hours no problem and suddenly a bunch of unexpected gateway replacement fighting against my script replacing it.


----------



## Courtland (Mar 2, 2013)

I'm experiencing the same symptoms as the other posters. The default route is replaced with a seemingly random IPv4 address. It happens on different machines running different hardware at different times of day. Traffic load seems to increase the likelihood of the issue.

I am running FreeBSD 9.1-RELEASE. Using PF+ALTQ. I am not using IPFW/dummynet. I am not using DHCP for the Internet/WAN interface.

This did not happen on the same systems when running 8.3-STABLE.

I am currently combating the problem with a routine that checks the default route every 500ms and changes it back if necessary.

Has anyone had any luck tracking this issue down and fixing it?

Thanks


----------



## TretUliy2 (Mar 3, 2013)

How much traffic goes through your routers ?
I am trying to find a way to lock certain memory area that keeps default gateway, but no luck for now.


----------



## wblock@ (Mar 3, 2013)

Has anyone filed a PR for this?  It is fairly serious, and sounds like it is easy to repeat.


----------



## TretUliy2 (Mar 4, 2013)

Yes, somebody filled a PR PR kern/174749, it seems to me, that to reproduce this bug a lot of throughput traffic needed, and for now none can tell how to debug (trace what exactly routine in which kernel subsystem corrupt routing table) this crazy thing.


----------



## Courtland (Mar 4, 2013)

The system I'm seeing it on the most has anywhere from 30 to 100Mb/s sustained Internet-bound traffic. It happens about once per day but not at any specific timeframe. It seems to happen when there is heavy usage, which makes sense because I am guessing the problem is more likely to manifest itself with a greater packet rate.


----------



## kpa (Mar 4, 2013)

Sounds like a concurrency/locking issue in code that deals with the routing table in the kernel that manifests itself only when there's heavy traffic. Have you asked on the mailing lists?


----------



## Courtland (Mar 5, 2013)

There*'*s been a number of mailing list posts regarding the same problem over the last few years. I cannot tell if this is specific to 9.x or not as there is evidence of a similar problem in the past with 7.x and 8.x. However for myself, I started noticing it after upgrading systems from 8.3 to 9.1.

I have attempted to revive the issue in freebsd-net:
http://freebsd.1045724.n5.nabble.com/Default-route-changes-unexpectedly-td5792887.html


----------



## Courtland (Mar 6, 2013)

The most recent occurrence of this problem on one of my routers was accompanied by a stream of these messages in /var/log/messages before the default gateway changed.


```
Mar  5 19:12:48 kernel: arpresolve: can't allocate llinfo for 50.142.201.101
Mar  5 19:12:48 last message repeated 107 times
```

50.142.201.101 is the IP the default route was changed to in this instance.

There were also some peculiar named/BIND errors (BIND is setup as a forwarding resolver for the network), however these seem to occur frequently where as the default route changes once every day or two.


```
Mar  5 21:12:48  named[10906]: /usr/src/lib/bind/isc/../../../contrib/bind9/lib/isc/unix/socket.c:1890: unexpected error:
Mar  5 21:12:48  named[10906]: internal_send: 23.67.244.68#53: Invalid argument
```

The default route changed around Mar 5 21:12

Unsure if either of those errors are related, but perhaps someone else does?


----------



## Courtland (Mar 6, 2013)

Here*'*s another example of this issue on the mailing lists without resolution.

http://freebsd.1045724.n5.nabble.co...-for-65-59-233-102-td5742320i20.html#a5793139


----------



## TretUliy2 (Jul 4, 2013)

The problem was solved*.*

```
Log:
MFC of r249848

PR:	174749, 157796

Modified:
stable/9/sys/netinet/ip_output.c
Directory Properties:
stable/9/sys/ (props changed)

Modified: stable/9/sys/netinet/ip_output.c
==============================================================================
--- stable/9/sys/netinet/ip_output.c	Thu Apr 25 11:24:40 2013	(r249891)
+++ stable/9/sys/netinet/ip_output.c	Thu Apr 25 11:25:24 2013	(r249892)
@@ -194,8 +194,8 @@ ip_output(struct mbuf *m, struct mbuf *o
hlen = ip->ip_hl << 2;
}

-	dst = (struct sockaddr_in *)&ro->ro_dst;
again:
+	dst = (struct sockaddr_in *)&ro->ro_dst;
ia = NULL;
/*
* If there is a cached route,
```


----------

