Hard or Soft Storage? That is the Question and the Answer

There’s lots of press these days on Software-Defined Storage (SDS), Software-Defined Data Centers (SDDC), Server SAN’s, software-only virtual SANs, hyper-converged storage servers, storage appliances and the like. We’ve all been inundated with this new technology and architectural terms by bloggers, marketing mavens, PR, tradeshow signage, consultants, analysts, technology pundits and CEOs of new start-ups. As a blogger and marketing guy, I plead doubly guilty. But the emergence of SDS systems and SDDCs is real and timely. Definitions and differences, however, can be a tiny bit murky and confusing.

This enabling technology is coming to market just in time as today’s modern data centers, servers, storage arrays and even network/comm fabrics are getting more and more overtaxed and saturated with mega-scale data I/O transfers and operations of all types with all kind of data formats (i.e., file, object, HDFS, block, S3, etc.). When you add in the line of business commitments for SLA adherence, data security/integrity, compliance, TCO, upgrades, migrations, control/management, provisioning and the raw growth in data volume (growing by at least 50% a year) IT directors and administrators are getting prolonged headaches.

Against this backdrop, it’s no wonder that lately I’m getting asked a lot to clarify the difference between converged storage appliances, hyper-converged/hyper scale-out storage server clusters, and pure software-defined storage systems. So I wanted to make an attempt to provide a high level distinction between a storage hardware appliance and pure software-defined (i.e., shrink wrapped software) storage system, while also providing some considerations of choosing one over the other. In fact, architectural and functional differences are somewhat blurred. So it’s mostly about packaging…but not entirely.

Basically, we all know that everything runs on software – whether it comes pre-packaged in a hardware box (i.e., appliance) or decoupled as a pure software install. There are also distinctions being made between convergence (i.e., converged infrastructure) and hyper-convergence. Convergence refers to the extent compute, storage and networking resources have been “converged” into one virtual layer or appliance box.

Regardless of whether we’re talking about a converged or hyper-converged storage appliance based system using proprietary or COTS hardware, advanced/intelligent software is required in any box to run, monitor, control and optimize the resulting storage system. Some storage appliance vendors – including EMC – offer their “secret sauce,” software unbundled in a pure, software only version like ScaleIO and ViPR 2.0; Red Hat’s ICE (Inktank Ceph Enterprise) or VMware’s Virtual SAN. The main difference between hardware storage appliances and a pure software-defined storage system is chiefly how each is packaged or bundled (or not) with hardware. Some appliances may have proprietary hardware, but not all have to, and likewise not all appliances are commodity hardware based.

A hyper-converged box is typically a commodity based hardware appliance that has all three computing resource functions rolled up in one box or single layer. Traditional arrays consist of three separate layers or distinct functional components. Some commodity server based pure software-defined storage systems are also hyper-converged in that they are installed on application server hardware. Other converged systems (typically appliances) may consist of storage and networking in a commodity or proprietary box – for two layers. Converged and hyper-converged appliances and SDS systems, however, all typically aggregate pooled storage into one shared clustered system with distributed data/protection spread across appliance boxes or host servers/nodes. They tend to be storage device/hardware agnostic as well, supporting PCIe I/O and SSD flash media as well as traditional HDDs.

Appliance based solutions offer plug-n-play boxes with predictable scalability which can be quickly added for scale-out (more nodes) or scale-up (more storage capacity). Local area clusters can be created with data protection spread across multiple shared appliance boxes. Flash caching or storage pooling/tiering performance features enhance the overall user experience. Adding additional storage and compute resources is predictable in terms of incremental CAPEX outlays. There may be some constraints on scalability, performance, and elasticity capabilities but these restrictions may not be deal breakers for some use cases. ROBO, retail store outlets, SMBs and smaller datacenters, for example, come to mind where smaller capacity, defined hardware storage server appliances provide adequate converged resources. “Datacenter in a box” is often used by some appliance venders to position these smaller, geographically distributed deployments. It’s an apt sound byte. Other larger customers might simply add another box for more symmetrical storage, I/O performance, or compute. Either way, they get it all in a single unit.

In other cases, a pure software-defined storage solution or software-defined data center can be better. Why? Well again, use cases are a big driver. Optimal use cases for these commodity hardware based SDS systems include database/OLTP, Test/Dev, Virtualization and Big Data/Cloud computing, and where availability of existing commodity server resources for lower cost scalability is high. SDS systems like ScaleIO can be installed on commodity application servers and pool server DAS together to form a converged, aggregated shared storage asymmetric cluster pool with distributed data protection across participating servers. They do this while delivering huge performance (IOPS and bandwidth) and scale-out synergies realized from parallel I/O processing. In essence, a peer-to-peer grid or fabric is created from this software. Blogademically speaking, an SDS is analogous to an unconstrained system versus a contained appliance based solution. It goes without saying that both have their strong points.

Another aspect that comes into play is your tolerance for installation, integration, and deployment activities. Both hardware appliances and SDS systems have their strong and weak points in terms of the degree of expertise needed to get up and running. SDS systems can make that task set easier with their thin provisioned software module/installs, intuitive monitoring dashboard GUIs and/or English language based command interfaces. Thin provisioning available on appliances and some SDS systems offers more efficiency where the amount of resource used is much less than provisioned. This enables greater on-the-fly elasticity for adding and removing storage resources, creating snap shots as well as leveraging storage and resource costs in a virtual environment.

For some users, a commodity based converged or hyper-converged hardware appliance with internal hybrid storage (i.e., SSD Flash and HDDs) is the way to go. For others, it’s a pure SDS solution. Some customers and data center administrators favor a bundled, pre-packaged hardware appliance solution from vendors. They offer predictable resource expansion, performance and scalability as well as quick set-up and integration. A growing segment, however, prefers the ease of install: greater on-the-fly elasticity, lower TCO, hyper scale-out capacity/performance by simply adding more servers and DAS devices and reduced management overhead of an SDS solution. Thin provisioning allows space to be easily allocated to servers and on a just-enough and just-in-time basis for ease of scalability and elasticity.

In the end, the question of going hard or soft for converged or hyper-converged storage systems depends on your data I/O requirements, use cases, goals/objectives, existing resources and environment(s) and plans for future expansion, performance and flexibility. Both have utility.

About the Author: Rodger Burkley