Hypothetical question; if you’re storage array is 50% allocated is if half empty or half full? The pessimist would say it is half empty. The optimist would say it is half full. But most of the storage folks I talk to would say “who cares?”
In their minds the key challenge is not usually running out of storage capacity. With space efficiency technologies like thin provisioning, reclaim, compression and dedupe, snapshots, tiering, etc., there are lots of proven ways to optimize capacity utilization.
The issue keeping many storage folks awake at night is not knowing if you have the capacity AND performance to meet an app’s needs. Sure, there’s plenty of capacity available to add 2TB for the new database. But are there enough array resources (compute, memory, bandwidth, IOPS, etc.) to deliver the needed performance? How will adding this new app impact the others already running on the array? How do I know if the apps are running in compliance with their desired performance service levels? And how do I know when I’m running out and need to add more resources?
These are some pretty meaty technical issues. Without the right storage skills, performance tools, and understanding of different app workload types, it’s not easy to know if or when they might be a problem. The notification is usually when the phone rings and there’s a concerned user on the other side wanting to know why their apps are running slow. I’ve heard storage folks describe this situation many times over many years. They all have the same and simple request. Fix it.
To really fix it, it takes a new approach to how the system is designed, packaged, configured, and managed. It’s a big part of the innovation that has gone into the VMAX3 architecture as well as a key strategy focus across all EMC storage platforms. To fix it means turning the traditional provisioning process upside down. Today we often build the storage bucket and start pouring in apps until it overflows. Knowing how many more apps you can pour in or when you are about to hit that overflow point is to use a technical term, hard.
Provisioning by service level (or in EMC lingo “SLO Provisioning”) lets the app guys request a certain level of service, say “Gold” to get a 5ms average response time. The storage system then automatically sizes and determines if the storage infrastructure has enough resources (ports, engines, cache, disks, etc.) available to deliver it.
By running this admissibility check the system then knows if you can safely commit to the provisioning request. Not only will it tell you if you can deliver the 5ms response time, but it can also do it without impacting the other apps already running on the array. The key point is the request is validated before provisioning the storage. It’s sort of like making sure you have enough funds in your bank account so you know if you have enough to cover the check you are writing.
If it can’t deliver Gold, the system can tell you what’s actually available in the system and what performance you can deliver. So while you may not have enough left to deliver Gold at 5ms, you may have enough to deliver Silver at 10ms. If you really do need more Gold, the system can advise you what to add to get Gold (ie, like more front end, backend or disk resources). Again, like your bank account, if you only have $50 and you need to write a check for $100, the system tells you to deposit another $50.
Just like overdraft protection, the system won’t stop you from asking for Gold and provisioning it, even if you don’t have enough. It will tell you that you shouldn’t. If you do it anyway the system will try and loan the resources from other apps if they are available. The system can’t guarantee that Gold can be delivered for this app.
Again the part that’s changed is that now the storage folks know when they are running out. There’s no magic bullet to prevent the bucket from getting full or making sure all your storage “checks” will always have enough in your account be covered. At least now the storage folks know before it happens, and when it does, how to fix it.
Is SLO provisioning a game changer? I guess it depends on your perspective. Here’s a good analogy (apologies to the millennials for going old school). Remember your gas gauge in your first car? It told you when you had half tank of gas. While you knew when it hit “E” you were out of gas, you didn’t really know how far you could go before you needed to gas up.
If you remember those days you probably have seen a car or two on the side of the road with its gas cap opened and the driver a few miles up the road looking for a place to get some emergency gas. These were often the folks that knew they were close to “E” but decided to see how far they could push the needle to the left.
So why does that not happen very often today? I’d argue most drivers don’t really care if their tank is half empty or half full or even interested in seeing how far below “E” they can go. The cars of today don’t just tell you how much gas you have in the tank but more importantly how many miles you can go until you run out. I have no idea how full my gas tank is. What I do know is I can get back and forth to work for 3 more days until I need to add more gas.
So if a car can calculate how many miles you can go before running out of gas wouldn’t it be cool, and even useful, if a modern storage system could do the same? That’s why SLO based provisioning is so important and dare I say “Game Changing” to storage folks.
Whether you’re a half empty or half full person you now have better visibility into what’s really available before you start to push the limits. As the storage folks know, whether it’s full, half full, or empty, your users don’t care, they just expect to consume it.