What You Need to Know to Build an Enterprise Class Object Storage Hardware Layer

Hardware Matters
When many people talk about object storage implementation, they focus on the software pieces like consistent hash, partitioning, replication, scale horizontally etc. But another critical component of a total object storage solution is the hardware. The operative word for object storage is still data and data lives on hard disk drives, but the hardware layer is also important for data availability in a total object storage solution.
There are two choices for the object store hardware layer: commodity hardware or storage array with built-in reliability, availability and serviceability (RAS). Commodity hardware is attractive for economic reasons, but storage array based solutions still matter.

Hard Disk Failures
Take hard disk drive failure as an example for challenges we face in the hardware layer. A typical object storage use case may have hundreds of petabytes (PB) of object data and hundreds of thousands of hard disk drives. Monitoring the health of such a great number of hard disk drives is a big challenge. Especially when the hard disk drive is the most error-prone component in modern computers. According to a hard drive failure analysis report from Google’s own data center, Annual Failure Rate (AFR) is ranging from 2% to 10% depending on the age of disk drives (see figure below). For more information, please see Google’s research paper titled “Failure Trends in a Large Disk Drive Population.

Object_-_HL
Figure 1. Google Research: Annualized failure rates broken down by disk age groups

Take for example a scenario where you are building a 100PB object store with three replicas for each object. This would mean you would have at least 300,000 1T SATA drives. If AFR is 2%, it means you’ll have 6,000 failed disk drives per year and 16 failed drives per day. The number increases four to five times after disk age is greater than two years old!

 Furthermore, hard disk drive vendors are increasing dentistry of hard drive sectors in order to grow capacity of a single drive. You can expect higher failure rate in the future.

So disk health monitoring and failed disk replacement will be a daily operation in any cloud service provider’s data center creating demand for a centralized management GUI and notification/alerts for failure disks.
Opportunity for Traditional Storage Arrays

If you use commodity hardware to build the hardware layer, you need to manage hard disk failures using your own tool or third party solution. The other choice for object store hardware is traditional storage arrays like EMC VNX which has built-in disk health monitoring tools based on S.M.A.R.T. It is a pretty mature technology and proven over decades by a lot of customers. Besides failure disk reporting & notification tool, an attractive feature of an EMC array is that it proactively prevents disk failures by monitoring S.M.A.R.T data. If the certain amount of disk read/write errors reach a designated threshold and EMC’s proprietary algorithm identifies the disk is going to fail in the next few days, it proactively copies the disk to a hot spare. EMC array has mature disk failure prediction algorithm and the accuracy could be around 70%.

Disk failure prediction is very important for data availability in enterprise class object store. Without disk failure prediction, think about putting all three object replicas to three individual drives that are all going to fail in next two days, but it is the weekend and your data center maintenance engineer could not arrive. You are putting yourself at the risk for data loss.

Mature storage arrays like EMC VNX also have other value-add features like FAST (a.k.a storage tiering) which could put hot data in Solid State Disk to improve I/O throughput.

Conclusion
EMC ViPR object storage is designed for enterprise class object store and runs on top of either commodity hardware, or traditional storage arrays like EMC VNX/Isilon, or a mixture of both.
We leave the hardware choice to our customers.

If you prefer a more cost effective solution, you could run ViPR object on commodity hardware. Or if you prioritize serviceability and availability, the combination of ViPR and an EMC VNX could be best choice for your business. Cloud service providers that want to compete on more than just price understand that the market will demand enterprise class object storage solutions that provide variable data durability and governance and compliance features.

About the Author: Hui Liu