The other day I was jogging with my engineering pal, Zach Loafman (of FreeBSD fame), alongside beautiful Elliot Bay and the subject of de-duplication came up.
He remarked: “I’m really curious to see where deduplication is when the economy recovers. Is this something people really want or is it just a fad?“
Great question, Zach.
I think that’s a topic every IT administrator and storage vendor should be thinking clearly about before they invest precious dollars…
Deduplication (or more generally, data reduction) for backup and archive is a no-brainer. A backup or archive workflow built around a tape mindset and traditional storage generates a lot of redundant and compressible data in the form of incrementals and organization-wide duplication. Since you’re making a secondary (or tertiary) copy of that data with the expectation that it will be infrequently accessed (ideally, never) and aggregating it from many sources, going the extra step to reduce duplication can easily be justified.
I think data reduction for nearline and backup is here to stay and will get more interesting as we move away from tape-based disaster recovery techniques and traditional storage – I don’t know that it will exist (with the same degree of prevalence) as a search and detect mechanism (versus single-instancing and content-aware techniques, for example).
What about primary storage?
Is there really a high-enough level of redundant data that IT administrators will pay the extra costs to search and detect? These costs (as I see it) will come in terms of higher capital expenses (more powerful storage arrays), application impact (increased latency) or increased complexity (3rd-party software). Is that redundancy going to exist over time on primary storage?
I’d really love to hear some feedback on this topic – anyone out there with a perspective to share?