# BBB closes TCP connections immediately



## mb2015 (Jun 14, 2017)

I have FreeBSD 11.0-STABLE installed and booting from a 64GB microSD card in a BeagleBone Black. The eMMC is dedicated to swap.

Normally it works well, sometimes for months without any problems. But other times it gets into a strange state, where it remains pingable, but it closes any TCP connections as soon as they are opened. To get it working again, I have to have someone power-cycle it, since it is running as a headless server at a remote location. 

Nothing is ever in the logs to indicate what has gone wrong.

I can't reliably trigger the problem, but it seems to mainly happen at night when a backup script runs `mysqldump`. I have also had it happen at random when running `portsnap fetch update`, or when building something big in the ports collection, or during a buildworld.

Strangely, it happens much less often if I plug something powered into the USB port. And it _never_ happens when I have the BBB at home with the FTDI cable attached to my PC. So I am thinking there is some kind of static buildup in the hardware.

Any ideas how to diagnose further?


----------



## Phishfry (Jun 14, 2017)

Are you using a dedicated power source? I chased some of my problems down to insufficient power amperage.
The reason I say that is both cases look like the board is under load and that is when mine acted up until I got better power.

From what I have read the BBB only needs 3.3V to run unless you want to use any USB devices where as you need 5V.

Swap on eMMC is not how I would do it. I feel there is a limited write cycle on the eMMC and something like swap would put unneeded wear on a soldered-on device. Have you tried some other swap arrangement in case swap on eMMC is acting up? Under lots of write load for backups and portsnap..


----------



## mb2015 (Jun 15, 2017)

Thanks for taking a stab at it.

Yes, it's always using a dedicated power source via an AC adapter I bought with the board.

I had the same problems when using a swap file on the microSD card. I thought switching swap to the linux partion on the eMMC would help. It did help performance when swap was needed, but did not solve the problem I'm having.

I don't think I need to be worried about wear; swap isn't really used that often, and not very much of it is ever needed. The microSD card sees way more action. Maybe the problem is there? (It's a Samsung class 10, U1, 48 MB/s.) But the server is lightly loaded and the problem has been there from day one.


----------



## ralphbsz (Jun 16, 2017)

mb2015 said:


> But other times it gets into a strange state, where it remains pingable, but it closes any TCP connections as soon as they are opened.


That is typically a symptom of either the kernel half-wedged, or the system is completely out of memory, or the root file system has died.  But there are also a million other possible causes, so this alone doesn't help us debug.



> ... a headless server at a remote location.
> Nothing is ever in the logs to indicate what has gone wrong.


Is there any way to get a console attached, and then have a person report what they see on the console?  Another option would be running a script that does `vmstat 1` and saves the output in a file in /tmp, to see whether the system is slowly running out of something it needs.



> ... it seems to mainly happen at night when a backup script runs `mysqldump`. I have also had it happen at random when running `portsnap fetch update`, or when building something big in the ports collection, or during a build world.


That indicates that it depends on load.  Could be running out of memory, could be running out of space in /tmp, or could be simply overheating.  Any way to monitor the CPU temperature while running, and then record the output in another file in /tmp?



> Strangely, it happens much less often if I plug something powered into the USB port. And it _never_ happens when I have the BBB at home with the FTDI cable attached to my PC. So I am thinking there is some kind of static buildup in the hardware.


Seems to be related to some environmental condition.  Suggestion: Plug something into USB, attach the FTDI cable (but not to a PC), and also put a cooling fan on it.

This is going to be hard to debug.


----------



## aragats (Jun 16, 2017)

mb2015 , does `dmesg` show something related to _*smsc*_ driver? That NIC (smsc95xx) is connected via USB, and there is a known (but hardly triggerable) issue with its Linux driver when the network drops intermittently. I guess, the principle part of the driver code is shared with FreeBSD.


----------



## mb2015 (Jun 16, 2017)

`dmesg | grep smsc` shows:

```
smscphy0: <SMC LAN8710A 10/100 interface> PHY 0 on miibus0
smscphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
```
Last time the problem happened, I had my friend attach a monitor via an HDMI-to-microHDMI cable. He said the BBB apparently was not putting out any video until after he power cycled it.

In /etc/syslog.conf I have

```
console.info                                   /var/log/console.log
```
...but nothing of note ever shows up in console.log, just startup messages and innocuous things like when I run `su`.

The board normally has the FTDI cable plugged in, other end not attached to anything. I also do normally keep a USB cable attached to an AC adapter since it seems to help. There's no cooling fan, but I would be surprised if it is overheating. When I first got the board, before installing FreeBSD, I installed something on the Debian partition to peg the CPU at 100% (distributed.net client), and a script to monitor the temperature. I see now that maybe the temperature readings from the CPU are not to be trusted, but it never misbehaved. It was reading consistently about 75°C inside the case in a 20°C room, and the CPU is rated to 90°C. eMMC/SD card and network activity wouldn't raise the temp much above that, would it? Regardless, I don't think there is a FreeBSD driver for the BBB's CPU temperature sensor.

I went ahead and ran `set nohup` and `vmstat 1 > /var/tmp/vmstat.out &` ... we'll see if anything happens tonight. Update: no problems yet. Now doing vmstat via cron so it only runs during the backup hour. Will post again next time the problem happens.


----------



## aragats (Jun 16, 2017)

I had official email correspondence with Microchip, they suggested disabling Turbo mode (in Linux), it helped somehow, but under really heavy network load it failed anyway. I'm not sure how to deal with the Turbo mode in FreeBSD.


----------



## mb2015 (Jun 23, 2017)

OK, it happened again. As before, there was no HDMI output until after a power cycle.

Here's the vmstat output: https://pastebin.com/cKyeiBg7

It starts at 04:00, so the first ~60 lines are the minute before the MySQL backup script starts at 04:01. This script runs `mysqldump`, piping the output to `bzip2`, writing to a file in the /usr hierarchy. Normally this takes about 7 minutes. Then it runs `mysqlcheck` twice (once to check & repair, once to optimize), which takes 2 or 3 minutes. As it turns out, since the last crash, there was a damaged table which wasn't getting repaired and was causing the dump to abort, so the dump file was almost entirely empty, and the script was probably done running after only a few minutes.

So this time around, things did not go haywire until about 04:15, corresponding to the end of the `vmstat` output, lines 1020–1030. 04:15 is when /etc/crontab tells cron to run `periodic daily`. `periodic daily` runs a lot of scripts, so it is hard to know exactly what it was doing when things locked up.

One particular hog that I know about is /usr/local/etc/periodic/daily/sa-utils, which I installed via the mail/sa-utils port. It automates the downloading and compiling of SpamAssassin rules. It relies on `sa-compile` which, like anything related to SpamAssassin, is quite slow and RAM-hungry. Still, you would think having 3.5 GB of swap would be enough.

Based on the presence of an empty tarball timestamped 04:17, it looks like my mail-and-configs backup script did try to run at its usual time, but failed to work properly, as the system was already being brought to its knees by then.

After the hard reboot the next day, it didn't fully come up; network connections were being actively rejected. Another power cycle "fixed" it.

Things I've done tonight:

ran a `repair table <tablename> use_frm` command in the MySQL client so the backup script will be able to produce a full dump again.
modified my backup scripts to send `vmstat 1` output to files in /var/tmp so we can see the effect as they run.
tested my backup scripts; they ran without any problems.
added daily_output="/var/log/daily.log" to /etc/periodic.conf so I can hopefully be assured of getting a `periodic daily` log, even if it is incomplete.
I'm not adept at reading `vmstat` output, so let me know what you think and what I should try next. Thanks!

*Update:* After a couple more system crashes, I updated MySQL and am now trying some tuning (settings in my.cnf) to get its memory footprint reduced.


----------

