Posts

A deeper dive into disk drive survival time

Image
Evaluating newer classes in the context of historical failure data: Time windowed KM survival curves Background:  A substantial proportion of online data and services rely on hard disk drives that form a ubiquitous part of modern information infrastructure, so reliable statistical analysis of differences in failure over time for different disk drive models is of particular interest to those responsible for maintaining storage integity at home or at work. The Backblaze hard disk failure data represent an interesting "big data" analytic opportunity to compare enterprise and consumer hard disk drives over time under real world operating conditions. In this article, some statistical issues are discussed and the results of a some simple analyses are presented. The results provide interesting insight that cannot be obtained by the use of simple descriptive statistics and the statistical tests show that many of the differences observed are important and unlikely to have arisen...

Update to Q1 2017: Seagate redeemed?

Image
Update June 8 2017 After some delay, I finally got around to downloading another 9 months of data and rerunning the KM plots. Methods are documented in the first post  http://bioinformare.blogspot.com/2016/02/survival-analysis-of-hard-disk-drive.html  and won't be repeated here. Note that drive models with fewer than 500 units, and manufacturers with fewer than 200 units are ignored to simplify the plots - you can fix this in the code if you need.  Images below are available for the closer inspection they deserve at  https://github.com/fubar2/backblazeKM - they really are too detailed to appear here - sorry for the ugly layout here but you can download them or clone the repository if you want a closer look. Straight to the chase. Here's the drive model survival curve to date: The newer Seagate ST8000NM0055 is promising excellent longevity although there's only a tiny duration of observation so the initial curves may change with time.  Also, we ...

Backblaze hard disk drive failure data: Update to Q2 2016

Image
Ross Lazarus, September 2016 This is a Kaplan Meier analysis of the BackBlaze hard drive reliability data, using all available data to end second quarter of 2016  from  https://www.backblaze.com/b2/hard-drive-test-data.html   .  Previous posts are  at  http://bioinformare.blogspot.com.au/2016/05/survival-analysis-of-hard-disk-drive.html  and  http://bioinformare.blogspot.com.au/2016/02/survival-analysis-of-hard-disk-drive.html   . I reran my scripts and got the plots shown below. It's taking a while to read all the data as there are now a very large number of drives spinning. A total of  41740623 rows were processed in about 35 minutes on my home desktop by the python script in the github repository. The new 8TB drives are performing the best of all - even better than the HGST and Hitachis - and way better than any of the earlier seagates. Hard to miss here - not so obvious in the report at Backblaze https://www.backblaz...

Backblaze hard disk drive failure data: Update to Q2 2016

Image
Ross Lazarus, September 2016 This is a Kaplan Meier analysis of the BackBlaze hard drive reliability data, using all available data to end second quarter of 2016  from  https://www.backblaze.com/b2/hard-drive-test-data.html   .  Previous posts are  at  http://bioinformare.blogspot.com.au/2016/05/survival-analysis-of-hard-disk-drive.html  and  http://bioinformare.blogspot.com.au/2016/02/survival-analysis-of-hard-disk-drive.html   . I reran my scripts and got the plots shown below. It's taking a while to read all the data as there are now a very large number of drives spinning. A total of  41740623 rows were processed in about 35 minutes on my home desktop by the python script in the github repository. The new 8TB drives are performing the best of all - even better than the HGST and Hitachis - and way better than any of the earlier seagates. Hard to miss here - not so obvious in the report at Backblaze https://www.backblaz...