As of March 31, 2020, Backblaze had 132,339 spinning hard drives in their cloud storage ecosystem spread across four data centers. Of that number, there were 2,380 boot drives and 129,959 data drives. This review looks at the Q1 2020 and lifetime hard drive failure rates of the data drive models currently in operation in their data centers and provides a handful of insights and observations along the way.
Hard Drive Failure Stats for Q1 2020 - Seagate Suffered The Worst Losses
At the end of Q1 2020, Backblaze was using 129,959 hard drives to store customer data. For their evaluation, they removed from consideration those drives that were used for testing purposes and those drive models for which they did not have at least 60 drives (see why below). This leaves them with 129,764 hard drives. The table below covers what happened in Q1 2020. The Annualized Failure Rate (AFR) for Q1 2020 was 1.07%. That is the lowest AFR for any quarter since they started keeping track in 2013. In addition, the Q1 2020 AFR is significantly lower than the Q1 2019 AFR which was 1.56%.
During this quarter 4 (four) drive models, from 3 (three) manufacturers, had 0 (zero) drive failures. None of the Toshiba 4TB and Seagate 16TB drives failed in Q1, but both drives had less than 10,000 drive days during the quarter. As a consequence, the AFR can range widely from a small change in drive failures. For example, if just one Seagate 16TB drive had failed, the AFR would be 7.25% for the quarter. Similarly, the Toshiba 4TB drive AFR would be 4.05% with just one failure in the quarter. On the contrary, both of the HGST drives with 0 (zero) failures in the quarter have a reasonable number of drive days, so the AFR is less volatile. If the 8TB model had 1 (one) failure in the quarter, the AFR would only be 0.40% and the 12TB model would have an AFR of just 0.26% with 1 (one) failure for the quarter. In both cases, the 0% AFR for the quarter is impressive.
There were 195 drives (129,959 minus 129,764) that were not included in the list above because they were used as testing drives or they did not have at least 60 drives of a given model. For example, they have: 20 Toshiba 16TB drives (model: MG08ACA16TA), 20 HGST 10TB drives (model: HUH721010ALE600), and 20 Toshiba 8TB drives (model: HDWF180). When they report quarterly, yearly, or lifetime drive statistics, those models with less than 60 drives are not included in the calculations or graphs. They use 60 drives as a minimum as there are 60 drives in all newly deployed Storage Pods.
Method On How Backblaze Is Able To Produce Accurate Data With The Addition Of New Drives
Using the drive count method, model BB007 had a failure rate of 0.93%. The reason for the difference is that Backblaze is constantly adding and subtracting drives. New Backblaze Vaults come online every month; new features like S3 compatibility rapidly increase demand; migration replaces old, low capacity drives with new, higher-capacity drives; and sometimes there are cloned and temp drives in the mix. The environment is very dynamic. The drive count on any given day over the period of observation will vary. When using the drive count method, the failure rate is based on the day the drives were counted. In this case, the last day of the period of observation. Using the drive days method, the failure rate is based on the entire period of observation. In their example, the following table shows the drive count as they added drives over the six month period of observation:
When you total up the number of drive days, you get 878,400, but the drive count at the end of the period of observation is 6,000. The drive days formula responds to the change in the number of drives over the period of observation, while the drive count formula responds only to the count at the end. The failure rate of 0.93% from the drive count formula is significantly lower, which is nice if you are a drive manufacturer, but not correct for how drives are actually integrated and used in the environment. That’s why Backblaze chooses to use the drive days method as it better fits the reality of how their business operates.
Backblaze always has the most thorough results for drives and this report is no different. A significant problem was that Seagate drives were failing more often than any other drive which is disappointing.