I’m fresh from Supercomputing 2009 and speed is on my brain. A topic that I haven’t explored yet (in this forum) is solid state disks (SSDs). We’ve seen some customer demand for this – we do not currently offer SSDs in our nodes – and some competitors have integrated SSDs into their products as well. The mainstream list (that I know of) includes EMC, NetApp and Sun.
So far there are several approaches that I’m aware of:
SSD as part of the caching layer: This is how NetApp and Sun have integrated SSDs into their products. NetApp includes SSDs as part of their PAM II module (recently announced and available soon) which acts as a cache in any particular filer head. Sun also offers SSDs in their systems, and the SSD is used as part of their file system journaling.
NetApp (update – and Sun) use it as a read cache, while Sun also uses it as a write cache. I’ve been told that SSDs are not much faster than SAS disks from a write perspective (or perhaps from a $/IO) so the use of it as a read cache definitely makes sense. Perhaps the write cache will make even more sense once the SSD technology improves.
Obviously there are still big challenges in the use of SSDs as read/write caches in the traditional (NetApp and Sun) architectures, as the NAS unit itself can quickly become the bottleneck – but hopefully if you’ve made it this far you already understand this principle.
SSDs as file storage: Another use I’ve seen for SSDs has been as file storage itself. I believe EMC is shipping pure SSD units (although they call them enterprise flash drives – marketing at its finest). Deploying all SSDs for a volume or LUN is obviously extremely expensive. I believe that EMC’s vMAX storage system will soon possess the ability to dynamically move blocks to SSDs depending on their activity heuristics – which is a pretty neat application of the technology and a hybrid of using SSDs for ephemeral caching (exploiting time and spatial locality) vs. direct data placement. Clearly the redistribution of the blocks will only be as effective as the heuristics that understand the access patterns; you don’t want to spend all your time moving data that won’t be accessed again.
The other big category (which doesn’t appear to be implemented by any vendor today) is the ability to put specific data structures or files on SSDs in a static fashion, while coexisting with other types of storage – for example, placing all file-system meta-data directly on SSDs (for amazing namespace performance) or allowing select files to be placed on SSDs. The latter capability would be used by customers when they’re specifically aware of the way their applications function and want to maximize performance without having to segregate the data into different volumes/LUNs. I can imagine placing the start of each file on SSDs to minimize first-access latency while leaving the other portions of the file on disk.
All of these approaches are beneficial for different workloads and applications – despite the recent setback with STEC (which arguably is a reflection on the broader economy) – I expect to see the application of SSDs become more widespread and offer many opportunities for innovation.
Of course, SSDs will make the most sense in a Scale-Out platform – where you can continually scale out performance and capacity in a single namespace/single volume – but you already know that.
I highly encourage comments – what unique/key storage applications for SSDs am I missing?
[Minor update above – I meant to indicate that Sun’s use of SSDs was for both read and write caching – HT Andy and Andrey.]