There are two types of storage sprawl. The first results from the never-ending creation of new content that must be retained. This type of sprawl is typically capacity driven and while storage must be purchased to keep pace, often inexpensive large capacity SATA arrays are more than sufficient to do the job.
The second type of storage sprawl results from performance-oriented applications. While it may not lead to as many terabytes deployed in the data center and thus isn’t typically thought of as a capacity problem, this type of sprawl represents the most expensive portion of the storage environment because it is reliant on enterprise class disk drives in high-performance arrays, which are much more expensive (and smaller in capacity) than SATA drives. In these environments, several factors lead to storage sprawl, but often administrators don’t even recognize the problem because it has existed for so long it’s just accepted as “the way things are.”
On XtremIO customer calls, we frequently hear things along the lines of “we don’t have a capacity problem” or “we don’t have a performance problem” because the customer has an environment that is currently meeting their needs. It’s interesting as you dig deeper and look at what it took to “not have problems.” What we find is very consistent and paints a crystal clear picture of how inefficient current storage environments are. I’d encourage you to think about your own storage environment in this context.
Inefficient Storage Practices
- Using RAID 10 – RAID 10 is commonly used in performance-oriented environments because it provides excellent read performance (there are two distinct copies of the data residing on separate groups of drives which can be read in parallel) with low overhead for writes (each host write results in two array writes, one to each side of the mirror). Of all the available RAID levels, RAID 10 delivers the best performance, but at the high cost of requiring twice the number of drives for data protection. Losing 50% of your raw capacity to data protection overhead right from the start is a stiff price to pay for performance, especially when using expensive 15K RPM enterprise drives.
- Gaining performance by stranding capacity – This often happens in database environments. Imagine a company with an SAP environment running on Oracle. The SAP database is 3TB in size but needs 30,000 IOPS. While the database will physically fit on just a couple of large capacity drives, it will not deliver the required performance if provisioned that way. So instead, it is striped across dozens of spindles (easily 100+ in this example). Assuming a common 15K RPM drive size of 300GB, this database is now consuming 30TB of storage (before factoring in data protection overhead), even though it is only 3TB in size. Unfortunately the extra capacity is often useless as allocating it to any other applications would subtract from the performance required in the SAP/Oracle environment.
- Ineffective multipathing – Multipathing, the technique of allowing hosts to access storage over parallel connections to improve performance (and reliability in the case of a path failure), is essential to driving the best I/O performance from a storage array. In many array designs, all paths are not created equal. There is a fast path that goes through the storage controller that owns the LUN, and a slow path that goes through an alternate controller without LUN ownership. This hinders the ability of the application to drive I/O into and out of the array and leads to over provisioning to compensate.
- Spare drives – Storage arrays maintain spare drives to allow RAID rebuilds to begin the moment a drive fails. At all other times spare drives are just added cost and capacity that is unavailable to use.
- Inability to use snapshots – Snapshots are powerful tools for data protection, but also can be used for making space-efficient volume clones. This is helpful in development and test environments to make quick copies of the production volumes. However, when actively used, snapshots deliver inferior performance, especially over time as their contents diverge from that of their parent volume. When snapshots can’t be used effectively, full volume clones must be created, consuming additional spindles.
- Inability to fill the array – Storage arrays degrade in performance as they fill up. Even high-end enterprise arrays start to see performance trail off with as little as 40% of the array filled. This forces administrators to run the arrays with large amount of free space that is stranded capacity. It’s there, but you can’t touch it.
- Inability to use data reduction – Data reduction techniques can substantially lower the amount of capacity required, especially in certain environments where duplicate data is common, such as virtualization. However, deduplication techniques are computationally intensive and inherently lead to volume fragmentation. Both of these issues substantially reduce performance and prevent data reduction from being utilized with performance sensitive applications.
- Mismatched thin provisioning – Thin provisioning is a powerful tool to more efficiently allocate capacity within an array and defer capacity purchases. However, thin provisioning schemes typically use allocation sizes far larger than the I/O sizes used by applications and operating systems. This mismatch causes over-allocation of capacity out of the thin provisioning pool, and leads to stranded, unusable capacity, and the need for the array to run performance sapping reclamation operations.
It’s easy to see how inefficient storage provisioning becomes. Think back to our example of a 3TB SAP/Oracle database needing 30,000 IOPS. Imagine if you bought 36TB of performance disk (120 x 300GB 15K drives). 50% remains off-limits in order to preserve array performance. You have 18TB left. Thin provisioning inefficiency reduces this another 10% to 16.2TB. You lose another 600GB for a couple of hot spares and are down to 15.6TB. You have to use RAID 10, so now you’re down to 7.8TB. That’s more than enough for your 3TB databases, but the remaining 4.8TB of space is useless to you since all 30K IOPS are consumed by the SAP/Oracle application. Your storage efficiency ratio? A dismal 8.3% (3TB/36TB).
Even worse is you have to purchase additional storage in order to support the SAP/Oracle development and test environment. This may be a less expensive SATA array, but it still costs money, takes up space, burns power, and requires administration.
Ahhh, but you say, disk is cheap. Certainly less expensive than the SSD arrays XtremIO sells. And you are correct. On a $/GB (raw) basis flash is substantially more expensive than disk. But let’s look at what it took to get the job done. The SAP/Oracle production array in our example above would cost roughly $5/GB or $180,000 (36,000GB x $5/GB). An array to handle the development/test environment will run another $2/GB, which works out to roughly $100,000 to get enough space for a handful of database copies. $280K total street price for support of this one application. I feel very comfortable saying that for this price you could have a shiny new XtremIO flash array that would easily support both the production and development/test environment we just described with vastly more performance, simplicity of operation, future expandability, and operational efficiency.
How can that be? It’s because XtremIO’s all-SSD design doesn’t just give you faster storage, it gives you better, more efficient storage. Here are five specific ways XtremIO helps better manage capacity and storage sprawl, in contrast to the picture painted above.
Five ways XtremIO helps with Capacity Management
- Efficient data protection – You don’t need to run RAID 10 mirroring in order to get amazing performance from an XtremIO array. This isn’t just because we use flash and rely on speed to make up for the inefficiency difference. XtremIO arrays use a patent-pending flash-specific data protection algorithm that requires very little capacity overhead yet outperforms all existing RAID algorithms. In an XtremIO array, more of the capacity you purchase is available for your data.
- Easily accessible performance – With XtremIO you don’t have to think about how many spindles (or SSDs in our case) are needed to get to a desired performance level. When you configure a volume it automatically gets the full performance potential off all the SSDs in the array. And if you want more performance, you can always scale-out the system. The problem of stranded capacity simply vanishes.
- Higher array capacity utilization – XtremIO arrays can be run to capacity without degrading in performance. You don’t have to leave huge percentages of free space in order to get the best results. Again, you get to use more of what you paid for.
- Efficient data reduction – XtremIO storage automatically deduplicates data in real-time without impacting the array’s performance. You’ll store less and make the most effective use of SSD capacity, and what needs to be stored will be kept with minimal overhead.
- No other wasted space – We’ll provide more details about this in the future, but suffice it to say here that XtremIO storage doesn’t have the limitations with respect to snapshots, thin provisioning, spare drives, or multipathing that exist in other systems.
Next time you walk through your data center, take a good look at the refrigerators full of drives and think about how much of it is just wasted space causing sprawl and needless expense. Flash may cost more on a $/GB basis, but if you need far fewer GB to get the job done, you might find it surprisingly cost effective.