Under the hood of FlexProtect (distributed erasure coding!)

As the world continues to demand more capacity for everything from high performance computing and Big Data to Virtualization and Enterprise Archive, traditional storage protection just doesn’t cut it any longer. There is a much better way to efficiently protect and recover your data – and at Isilon we call it FlexProtect. You may be familiar with terms such as parity, erasure coding, forward error correction codecs – or for the real geeks out there, Reed Solomon encodings. Let me introduce you to the next evolution in data protection – distributed erasure coding.

At Isilon, we’ve been big believers in distributed protection from the get-go. It’s at the heart of not only how we protect your data, but also how OneFS scales and delivers efficient performance and capacity utilization.

So what is distributed erasure coding and why should you care? I’ll start with the fact that in the past few years, hard drive capacity has grown exponentially, while the speed of the same drives has remained constant. As a result, the amount of time it takes to write that extra capacity has expanded at an exponential rate as well. The challenge for you? Traditional RAID technologies rely on the disk write speed to guarantee your data. The longer it takes to fill a disk, the more at risk your data is. In other words, RAID is no longer safe in a world of 2, 3 and 4 TB disk drives.

On the other hand, with OneFS we have built the industry’s first distributed, file-level erasure-encoding system. Specifically, we use Reed Solomon forward error correcting codes as part of our data striping, ensuring that we can protect safely from the loss of both nodes and disks – and we do this at the individual file-level, allowing us (and you) amazing flexibility.

What makes Isilon’s use of Reed Solomon particularly interesting is that we’ve solved the RAID scalability problem. When a failure occurs, instead of requiring a bit-for-bit rebuild of the lost media, OneFS will reconstruct data into any available space. This allows the filesystem to leverage not only many spindles to read during reconstruction, but also many spindles to write during reconstruction – and since this is all occurring in our highly scalable Scale-Out NAS architecture, you are also leveraging many CPUs! This dramatically decreases the time it takes to reprotect data, narrowing the window of risk and ensuring data protection scales with larger media.

I’m proud to say that this was the first patent that Isilon applied for, back in 2002. We called it the “Virtual Hot Spare”. A quick look around will show just how unique and powerful this capability is.

By leveraging erasure encoding in a,per-file, distributed fashion, OneFS and FlexProtect not only ensure that your data is safe regardless of size ,but that we’re using your raw storage efficiently. Our typical data to parity ratio is 16 to 2, resulting in an extremely efficient 12% overhead – even in multi-petabyte systems! Of course, we can do that even on smaller clusters – happy to tell you those deep secrets if you reach out to me on Twitter @Isilon_Nick. That’s FlexProtect.

It’a amazing – and now that you know what I know, you won’t sleep well at night without it.

About the Author: Nick Kirsch