# Breaking hard drives with compiling



## jozze (May 30, 2013)

Hello again, I've got a question regarding hard drive life expectancy under a strain.

Rebuilding world and ports often is putting quite a strain on a computer. I am no computer scientist, but I read somewhere that USB sticks have a limited number of writes before they go bad. I guess the same must hold true for the hard drives.

If so, how many writes do you reckon an average hard drive has with respect to the USB stick? Or rather, does the life-expectancy of the hard drive significantly decrease, because of compiling?

Thank you for your time.


----------



## Terry_Kennedy (May 30, 2013)

jozze said:
			
		

> Rebuilding world and ports often is putting quite a strain on a computer.


I'd call that a "load" and not a "strain", as it is anticipated by the manufacturer. Many years ago, some hardware was sold which was just plain defective. I was actually told by one company that a regression in their chipset was "no big deal" because "Windows [3.x] doesn't stay up that long anyway". However, I haven't run across any defective _designs_ quite a few years. Obviously, there are defective _units_ every now and again due to manufacturing problems. That isn't to say that they don't exist - there are well-known examples like the Cougar Point SATA ports.


> I am no computer scientist, but I read somewhere that USB sticks have a limited number of writes before they go bad.


Yes. This is due to the limited number of program / erase cycles on flash chips. More advanced designs use wear leveling to scatter these cycles around the flash chip(s), rather than concentrating them on a few areas. 


> I guess the same must hold true for the hard drives.
> 
> If so, how many writes do you reckon an average hard drive has with respect to the USB stick? Or rather, does the life-expectancy of the hard drive significantly decrease, because of compiling?


Traditional magnetic disks (as opposed to SSDs) are a completely different technology. There is no detectable "wear" from rewriting the same area over and over again. There are many different failure modes for hard drives. Some of them are:

Failures of the electronics
Catastrophic mechanical failure (like head crashes)
"Old age" failures from mechanical wear
None of these have much correlation to the drive being idle or in use. Some drives are [mis-]configured by their manufacturer to "spin down" (stop rotating) when idle. That might be Ok for a Windows box, but can lead to excessive start / stop cyling on FreeBSD (and many other non-Windows operating systems). This is often configurable via a manufacturer setup utility or something like the sysutils/ataidle port.

The mechanism that moves the heads back and forth (seeking) is designed for continuous operation, unlike the start / stop operation of the spindle motor. So seeking, which is the only mechanical difference between an idle drive and one that's being accessed [assuming the spindown issue is not present] doesn't cause wear on the drive.

Drive manufacturers generally don't disclose failure statistics, nor do they publish tables of what types of failures they're seeing.


----------



## kpa (May 30, 2013)

You could argue along the same lines that NAS type of usage puts more strain on the drives than typical desktop usage of a computer. Same line of thought can be applied to cars, if you drive 20000 km/year vs. 100000 km/year, in which case your car is more likely to break down sooner?

Evil compilers...


----------



## jozze (May 31, 2013)

@Terry_Kennedy, thank you very much, that was exactly the information I was looking for.
@kpa, this was my assumption, but I was curious if anyone actually did some qualitative measurements.

I'm all for compiling from source as much as possible (so I wouldn't really call compilers evil), but I was thinking whether or not to buy a new hard drive, as a precaution to me playing around with the system -- the one I have currently is around 3 years old, and I originally suspected my actions increased its aging process.


----------



## zspider (May 31, 2013)

Thanks for answering my question too. I was thinking about this a couple of days ago, especially since I started building the world and kernel.


----------



## SirDice (May 31, 2013)

Terry_Kennedy said:
			
		

> Traditional magnetic disks (as opposed to SSDs) are a completely different technology. There is no detectable "wear" from rewriting the same area over and over again.


Actually, there is. It just takes a very long time. For the same reason old audio/video tapes or floppy disks will start wearing out, the magnetism on the platters also wears out over time. This is why you eventually get bad sectors. Normally this isn't a problem, modern (S)ATA drives "map" these bad bits of disk to a spare bit of disk. This all happens within the drive's firmware. If you, as a user of the disk, start noticing the bad blocks it means this "spare" bit of disk is filled up and it can't map out bad sectors any more. Then it's time to replace the disk.



> The mechanism that moves the heads back and forth (seeking) is designed for continuous operation, unlike the start / stop operation of the spindle motor. So seeking, which is the only mechanical difference between an idle drive and one that's being accessed [assuming the spindown issue is not present] doesn't cause wear on the drive.


Excessive seeking, also known as thrashing, may cause damage though, if only because of the strain put on the arm. It can eventually lead to a head crash. That said, modern hard drives are built much more sturdier (and smarter) than the old 20MB disks I was used to 



> Drive manufacturers generally don't disclose failure statistics, nor do they publish tables of what types of failures they're seeing.


If I'm not mistaken Google did publish some interesting statistics. They obviously go through a lot of disks.


----------



## jozze (May 31, 2013)

I was looking up some specs of the WD Velociraptor 500 GB disk. It states, that MTBF (Mean Time Between Failures) has been calculated to 1.4 million hours. Does anyone know, what kind of failures do they mean by that?


----------



## SirDice (May 31, 2013)

jozze said:
			
		

> I was looking up some specs of the WD Velociraptor 500 GB disk. It states, that MTBF (Mean Time Between Failures) has been calculated to 1.4 million hours. Does anyone know, what kind of failures do they mean by that?



Any kind of failure that would warrant the replacement of that drive. Keep in mind that that figure is a "mean time". If it says, MTBF 10.000 hours it means somebody has that drive working for 19.999 hours but there's also a poor sod that got a broken drive after just one hour. It's an average but bigger numbers do indicate a more reliable drive.


----------



## jozze (May 31, 2013)

SirDice said:
			
		

> Any kind of failure that would warrant the replacement of that drive



OK, I thought it was for minor failures, not like this ...



			
				SirDice said:
			
		

> Keep in mind that that figure is a "mean time".



Thanks for the extra explanation, I am well aware of what it means 

But since I started this discussion, might I also ask, what are the most common causes for disk failures? Could, say, spontaneous power loss be counted amongst them?

EDIT:
OK, I just found an answer to my question on http://en.wikipedia.org/wiki/Hard_disk_drive_failure. Thanks everyone!


----------



## Terry_Kennedy (Jun 1, 2013)

SirDice said:
			
		

> Actually, there is. It just takes a very long time. For the same reason old audio/video tapes or floppy disks will start wearing out, the magnetism on the platters also wears out over time. This is why you eventually get bad sectors.


I don't think any disk manufacturer has used oxide media for many years now, due to density limitations. Modern media is thin-film, usually sputtered or plated (though I don't see why ion deposition wouldn't work). These provide pure magnetic material, not oxide (rust). They should permit an effectively infinite number of writes (while I won't say "heat death of the Universe" interval, certainly past the warranty period).

Bad spots can occur for a number of reasons in addition to magnetic degradation. The usual reason is a "write splice error" where the writing of the user data (plus any applicable error correction) overflows (or misses entirely) the area allocated for that sector and writes on top of other data, such as headers for adjacent sectors. This can be caused by power drops during writing, mechanical vibration, poor tolerances in the drive, etc.



> Normally this isn't a problem, modern (S)ATA drives "map" these bad bits of disk to a spare bit of disk. This all happens within the drive's firmware. If you, as a user of the disk, start noticing the bad blocks it means this "spare" bit of disk is filled up and it can't map out bad sectors any more. Then it's time to replace the disk.


That's only true for errors detected on writes and _correctable_ errors on reads. If the drive (or RAID volume) has an uncorrectable read error, that error will bubble all the way up and should return EIO to the user application. Sometimes telling the drive to format itself will deal with these (but with the loss of all other data). Sometimes the drive can't handle them at all and they will remain. The old methods of writing 0's to the drive with dd(1) or trying to read the same sector over and over in the hope that the drive will "notice" and map it out won't work on modern drives. "Classic" SCSI drives (not SAS) usually support changing the AWRE and ARRE mode bits to control automatic reallocation of write and read errors, respectively.



> Excessive seeking, also known as thrashing, may cause damage though, if only because of the strain put on the arm. It can eventually lead to a head crash. That said, modern hard drives are built much more sturdier (and smarter) than the old 20MB disks I was used to


Yup. I particpated in the redesign of a 40MB ST506-type drive to improve actuator reliability. I went to the drive manufacturer to "straighten things out" because my company was getting so many dud drives with actuator problems. We were the manufacturer's second largest customer (behind DEC). 



> If I'm not mistaken Google did publish some interesting statistics. They obviously go through a lot of disks.


Yes, I remember that study. The two things that limited its usefulness were that it didn't name (shame) drive manufacturers, and no detailed failure analysis (opening the drive in a clean room and determining the actual fault) was done.


----------

