# UFS on NVMe is not 100% supported?



## aragats (May 8, 2020)

The SSD is Micron 1TB NVMe on PCIe bus, FreeBSD 12.1.
This becomes very consistent: improper shutdowns cause bad damage to UFS on NVMe. I get corrupted files when powering up after such events.
Those issues mostly are just annoying like missing sync info for mbsync() or certain pidgin() settings. However, I reported a serious filesystem error in this thread a while ago.

I used this SSD in 2 different boxes and experienced the same symptoms.
What am I missing? Is it just me or a known issue?


----------



## ralphbsz (May 8, 2020)

I find it very hard to believe that Micron SSDs are fundamentally defective, and frequently corrupt data. If that were true, then their big customers would be killing them. Look at it this way: You probably have had handfuls or dozens of "improper shutdowns", meaning cutting power or crashing the OS. You claim to see consistent damage, meaning statistically meaningfully measureable on dozens of experiments. That means the probability of damage must be on the order of 10% (perhaps 5% or 20%) of something going wrong. Now extrapolate that to a large computer user with millions of computers using those Micron SSDs. They have unplanned power outages too (although much less often than you do), and they would be seeing thousands of cases, and they would be beating Micron up. Can you imagine what Jeff Bezos would do if this endangered his web business? There would be no building left standing within a 10 mile radius of their headquarters in Boise, Idaho.

Have you contacted Micron? Installed the newest firmware on the drive? Tried an alternate drive? But really, I don't think the problem is the drive at all.

Next suspect: UFS. Again, same argument. Millions of computers use UFS, including in large server farms (think NetApp, Netflix, Juniper). Many of those machines crash. Some of them run on blazing-fast storage, and have for many years (storage as fast as NVMe SSDs has existed for at least two decades, using either RAM disk hardware or large RAM caches on disk arrays). It is incredibly hard to imagine that UFS has bugs of this sort, and you are the first to experience them.

Here's my theory. You are pointing to two very complex pieces of software, which probably use many files, and do lots of file IO. I'm going to bet that these two applications were written without paying attention to the fact that file systems can (validly!) lose data when the system crashes, and do so in an inconsistent fashion, if the data was either written recently or is still being written. For example, the applications might first write to file A, then write to file B, and assume that when they start, anything that's in A will also be present in B, because A was written before B. Well, that assumption is wrong in a crash, because it's quite possible that the updates to A never made it to disk, while B did. And given sufficiently strange writing patterns (for example, write just a little bit, less than 512 bytes, to A, then keep the file open, while writing a heck of a lot of stuff to B and closing the file), that outcome is even very likely.

On traditional spinning disk based systems, the scenario that kills your applications may be very rare, because a lot fewer things happen, and in particular a lot fewer things happen at once.

My suggestion: Go into the source code of these applications (and any other programs that modify the same data), and add O_SYNC to all the open(2) statements. That might be a little difficult, as they might be hidden behind language-specific run time libraries (for example fopen in C, file in Python...). Given that your storage is blazingly fast, the performance hit is probably small and irrelevant. My educated guess is that a lot of these problem will go away.

An alternative would be to contact the authors of the applications. I don't think that will go well, but you can try.


----------



## aragats (May 8, 2020)

ralphbsz said:


> I find it very hard to believe that Micron SSDs are fundamentally defective, and frequently corrupt data.


Exactly, I don't even think about defectiveness of the SSD.



ralphbsz said:


> add O_SYNC to all the open(2) statements


That's a good idea, thanks!

I run pretty similar set of programs in several boxes with spinning HDDs and SATA SSDs, never seen such issues.


----------



## PMc (May 8, 2020)

I agree that it should be expected to have corrupted files after an unclean shutdown, except with applications like postgres that go a long way to avoid such things.
OTOH, in that linked thread the OP describes a situation where fsck didn't succeed and clri(8) was actually necessary. I haven't seen that in proably 20 years (and I am doing a lot of unsupported things), probably never at all since using UFS, and it should not happen. So this gives some credibility to the idea that something is indeed going wrong here.
But then, given the full stack of hw+os+ complex application, it will be very difficult to pinpoint.
To me the most practical approach would be to get rid of unclean shutdowns. 



ralphbsz said:


> My suggestion: Go into the source code of these applications (and any other programs that modify the same data), and add O_SYNC to all the open(2) statements. That might be a little difficult, as they might be hidden behind language-specific run time libraries (for example fopen in C, file in Python...).



Isn't there a mount option to force the filesytem into fully sync mode?
(I'm almost completely with ZFS , so I'm no longer concerned.  )


----------



## aragats (May 9, 2020)

PMc said:


> Isn't there a mount option to force the filesytem into fully sync mode?


I bet it will be very slow overall.


PMc said:


> the most practical approach would be to get rid of unclean shutdowns


Sure, but, e.g. my other box with those really bad UFS errors is a Dell Precision with a buggy Thunderbolt: many times it crashed when I plugged in a few USB devices in certain order. Then, of course, I learned the "correct" sequence (-;


----------



## mark_j (May 9, 2020)

Overall UFS is not great at functioning with abrupt losses of power, etc. That's just its design. Without journaling, this is always a problem.

You may like to look at syncer(4) and perhaps tune your system a bit so writes take place more often.


----------



## Alain De Vos (May 9, 2020)

Are the problems gone when you use zfs ?
On my desktop I have,
kern.metadelay=2                         # 28
kern.dirdelay=3                          # 29
kern.filedelay=5                         # 30
I don't know if they are "honored"


----------



## mark_j (May 9, 2020)

Absolutely, they do get honoured. (so long as kern.filedelay  *>* kern.dirdelay  *> *kern.metadelay )


----------



## ralphbsz (May 9, 2020)

aragats said:


> I bet it will be very slow overall.


Actually, given how insanely fast your SSD is, it will probably feel pretty fast. Modern SSDs with RAM buffers and low-latency interfaces (such as PCIe) are challenging traditional file system design, as RAM caching is becoming less valuable. You might wear out your SSD faster though, although for consumer use, that is a non-problem.



mark_j said:


> Overall UFS is not great at functioning with abrupt losses of power, etc. That's just its design. Without journaling, this is always a problem.


I respectfully disagree. If you turn on metadata soft updates, it works very well. It doesn't do miracles (files that haven't been sync'ed will still be lost), but the file system should be consistent.


----------



## mark_j (May 9, 2020)

ralphbsz said:


> I respectfully disagree. If you turn on metadata soft updates, it works very well. It doesn't do miracles (files that haven't been sync'ed will still be lost), but the file system should be consistent.



Actually you don't because that's exactly what journaling is.  Perhaps I should have phrased it better before?


----------



## ralphbsz (May 9, 2020)

Oh sorry ... I didn't know you meant soft updates. Never mind, carry on.


----------



## gpw928 (May 9, 2020)

I live rurally, and the mains power drops suddenly without warning quite often.  I don't have a UPS.

I can't remember the last time a FreeBSD system had a problem re-booting after a power failure.  I have both UFS with soft updates and ZFS file systems.  OTOH, my Debian systems almost always require a manual fsck for ext4 file systems.

It seems that your Micron NVMe SSD ecosystem (your software and your hardware on FreeBSD 12.1) has a fault. 

I think the only way to drill down is to start changing things slowly.  I would start by re-seating (and where possible cleaning contacts for) all the modular components inside the case, so CPU, SSD, memory, cables,...

When doing this I use an anti static wrist strap connected to the grounded case with the power off.  I use a high quality pencil eraser (e.g. Staedtler) to clean the contacts for memory modules and PCIe bus cards.

In the end, it might be cheaper and easier to buy a UPS than to work you way through the rest of the hardware.

Forcing synchronous writes ("-o sync") may well help, but I would measure the cost first.  It's always appallingly slow on spinning disks.


----------



## aragats (May 9, 2020)

gpw928 said:


> It seems that your Micron NVMe SSD ecosystem (your software and your hardware on FreeBSD 12.1) has a fault


That what I thought until I moved it from Dell to Thinkpad and discovered the same issues...
Now the only common thing is the SSD itself, however, as ralphbsz mentioned above (and I 100% agree), the probability of its defectiveness is very low.


----------



## mark_j (May 9, 2020)

Have you tried looking at SMART? smartctl(8)

Also, is there anything useful in any of the commands available under nvmecontrol(8)?


----------



## ralphbsz (May 9, 2020)

I hate to suggest it, since it is going to cost money, but maybe that one particular drive is defective. For example (just a hypothetical): Most SSDs have a small RAM buffer, which holds writes for a while, and which is guarded by a capacitor; in case of power failure, that buffer either gets protected against being erased by the capacitor, or written to flash with the charge in the capacitor. One big difference between SSDs and spinning disks is that SSDs can work on multiple IOs in parallel (while disks only write in one place at a time), so it is possible that the SSD has multiple sectors in that buffer, which the OS thinks have been written, but in reality are still in the RAM buffer. What if in your SSD that capacitor is defective, but the SSD firmware doesn't remember that after the next reboot, and thinks it can leave written data in the RAM buffer for a few microseconds? Then the SSD violates the contract with the OS that written IOs have to be hardened to persistent media, and that IOs have to be written in order.

So the nasty suggestion is: buy another SSD, perhaps even of the same model. Or contact Micron customer support (if the SSD is still under warranty), see what they say.

Mark_j is correct: in theory, looking with smartctl should help diagnose problems like that. Alas, SSDs are very complicated, and their support for smart tends to be very very spotty: they tend to work so ridiculously well that keeping track of their problems is mostly pointless. Except when it isn't. But definitely worth trying.

P.S. Regarding what gpw928 said: I just looked in my sys admin logs. I've been running FreeBSD (with the root file systems on UFS, with soft updates) since 2012, and I pretty religiously record any system administration action. Most of the time with UPS, but the system has crashed lots of time: battery in UPS bad, UPS software doesn't work correctly to shut down, one time I grabbed the server without shutdown and tossed it into the car because there was a fire nearby and we had to evacuate our house, pilot error (like pulling the wrong power cable), and disk failure. Of the dozens of times that it was crashed, it only twice (in 201707 and 202003) had to run fsck AT ALL. Neither time was there any data loss. The root file system is on an SSD (which is about 8 years old at this point), and has been power cycled 297 times during its life (probably half unintentional crashes, with the UPS and an automatic propane generator we don't have many real power outages). So UFS is AMAZINGLY resilient.


----------



## VladiBG (May 9, 2020)

https://www.micron.com/-/media/client/global/documents/products/white-paper/ssd_power_loss_protection_white_paper_lo.pdf
		




			https://www.epfl.ch/labs/lap/wp-content/uploads/2018/05/JimenezMar15_LibraSoftwareControlledCellBitDensityToBalanceWearinNandFlash_TECS.pdf


----------

