November #SANchat Transcript – All about compression

 Posted on behalf of Alison Krause, who works in Dell's Storage Product Group, Social Media & Communications.

With all of the excitement surrounding our acquisition of Ocarina last year, we decided to host a SANchat today that was all things compression. This was a part 1 chat, with part 2 being all things dedupe coming up December 7th. As a reminder, #SANchat is a monthly chat hosted by Dell Storage. For more background on #SANchat see this blog post  over on Dell Compellent’s Around the Block blog.

I’d like to extend a huge thanks to Mike Davis (@mike_davis) and Mohammed Farhat (@mfarhat) for joining as us our experts. We had a great conversation with a few customers. We discussed whether or not we can look forward to native block compression/dedupe  on EqualLogic in the future, why compression is important, when compression is implemented, and more. Mike also provided a link to a very helpful book titled “Data Compression Explained.”

You can find the full transcript below. Be sure to follow us on Twitter so that you stay up to date on the upcoming SANchats and tweet us if you have any follow up questions/comments! Don’t forget to join us on Wednesday, December 7th to talk about deduplication!

LiemNguyen	Have you followed @mike_davis and @mfarhat? They'll be talking about data #compression in 45 minutes on  #SANchat
iSCSIKing 10 minutes until our chat on Compression - Join the discussion. #SANChat
LiemNguyen @iSCSIKing Quick, how many puns can you think of in 10 minutes. I'll start: I'm aware I'm compressed for time. #SANchat
DellTechDE RT @iscsiking: 10 minutes until our chat on Compression - Join the discussion. #SANChat
RafaelKnuth RT @iscsiking: 10 minutes until our chat on Compression - Join the discussion. #SANChat
iSCSIKing "@LiemNguyen That is a very compressed time line to think of puns #SANChat"
LiemNguyen @iSCSIKing Oops, looks like we've reduced our time to nothing #SANchat
LiemNguyen OK, ready for SANchat to begin now...@mike_davis and @mfarhat are you out there? #SANchat
iSCSIKing @LiemNguyen Looks like it ... #SANChat
iSCSIKing RT @LiemNguyen: OK, ready for SANchat to begin now...@mike_davis and @mfarhat are you out there? #SANChat
mattjamesdavies @LiemNguyen @iSCSIKing You need to decompress some time guys! that will help! #SANchat
AlisonatDell lets talk compression!! what questions do you have for our experts? #SANchat
iSCSIKing RT @AlisonatDell: lets talk compression!! what questions do you have for our experts? #SANChat
Mike_Davis "I'm online. Just having a conversation with a customer about h264 compression. #SANchat"
iSCSIKing RT @Mike_Davis: Im online. Just having a conversation with a customer about h264 compression. <Awesome! #SANChat
LiemNguyen Hey @mike_davis thanks for joining us! Why don't you start with telling everyone what you do for a living. #SANchat
LiemNguyen "@mike_davis I'll ask you about the customer in a second :) #SANchat"
Mike_Davis I managed Marketing and product planning for Ocarina Networks, acquired 15 months ago by Dell. The ole team is still working hard. #SANchat
DellCompellent #SANchat with @mike_davis & @mfarhat on compression is starting now! Follow the conversation here: http://t.co/SEBG7RUA
dell_storage RT @DellCompellent: #SANchat with @mike_davis & @mfarhat on compression is starting now! Follow the conversation here: http://t.co/SEBG7RUA
JeffHengesbach Can we look fwd to native block comp/dedupe on Equallogic in the future? #sanchat
iSCSIKing RT @DellCompellent: #SANchat with @mike_davis & @mfarhat on compression is starting now! Follow us here: http://t.co/vD9lHIcC #SANChat
LiemNguyen @mike_davis Glad to hear that, and full disclosure, I came back to Dell via the #Compellent acquisition. @mfarhat what's your role #SANchat
Mike_Davis @JeffHengesbach I think we're on record that all Dell platforms will include data reduction. Too early to specify dedupe vs compr. #SANchat
Mike_Davis @JeffHengesbach Eql data reduct will first arrive with NAS, then at the block level later. #SANchat
mfarhat Hi Liem, glad to be here, I am a Product Manager on our DX Object Storage Platform, including our DX6000G Storage Compression Node #SANchat
iSCSIKing RT @Mike_Davis: @JeffHengesbach Eql data reduct will first arrive with NAS, then at the block level later. #SANChat
LiemNguyen @mfarhat @mike_davis First time we've discussed compression in detail here. Let's start w/ basics: Why is compression important? #SANchat
iSCSIKing Welcome @mfarhat glad to have you with us today #SANChat
DennisMSmith RT @iSCSIKing: RT @DellCompellent: #SANchat with @mike_davis & @mfarhat on compression starting now! http://t.co/or2hgRRD #SANChat
Mike_Davis Compression importance: not all data is easily deduped, not all data sets have redundancy to take advantage of.... #SANchat
InformaZen RT @DellCompellent: #SANchat with @mike_davis & @mfarhat on compression starting now! http://t.co/Dpzl9vM6 #SANchat
Mike_Davis ...so the data reduction solution needs to be tailored to the workflow. Vertical apps tend to benefit more from compression #SANchat
Mike_Davis Dedupe is fantastic for backup workflows (full-full-full...) but does nothing against precompressed files (video, images, MSoffice) #SANchat
bdwill RT @JeffHengesbach: Can we look fwd to native block comp/dedupe on Equallogic in the future? #sanchat | PLEASE!
Mike_Davis dedupe; not an ounce of benefit for a video archive. Applying generic (eg LZ) compression won't work either. #SANchat
mfarhat Compression is an important element of storage management, allowing significant efficiencies in physical storage utilization #SANchat
Mike_Davis To compress video further we developed a specialized set of algorithms that understand the formats (eg EXR, raw/DV, AVI, etc). #SANchat
mfarhat The DX6000G Storage Compression Node allows the tiering of data through multiple compression options.. #SANchat
Mike_Davis "@mfarhat What are the attributes a customer can use for compression policies? #SANchat"
mfarhat @Mike_Davis customers using DX Object Storage with compression can set policies based on file type or age (life point).. #SANchat
Mike_Davis Are there any customers online who have deployed SW compressors on server/host (external to app) to shrink data? #SANchat
mfarhat These policies enable tiering of data through multiple, optimized, compressors. For example, customers may choose.. #SANchat
LiemNguyen RT @Mike_Davis: Are there any customers online who have deployed SW compressors on server/host (external to app) to shrink data? #SANchat
mfarhat ..to leave certain file types uncompressed for a period of time, and then apply a compressor optimized for speed of access.. #SANchat
Mike_Davis h.264 (MPEG4 variants) are a tough one. We found ways to shrink, but at a high cost to CPU, but small savings can have huge payoff #SANchat
mfarhat ..and at a later time still, apply a compressor that deliver maximum space savings for dormant data. #SANchat
Mike_Davis Re bloak/array-based data-reduction, it's harder to apply interesting art...data is generall opaque. So we run inline naive algor. #SANchat
Mike_Davis ...but choice of algor also constrained by CPU/RAM resource available in the array. We don't want to cause DOS attack on IO! #SANchat
the_saltworks thanks for the heads up @LiemNguyen I had no idea I was missing a compression discussion on #SANchat
the_saltworks @Mike_Davis there are a cpl ways to further reduce storage consumed by multimedia 1) More lossy compression & 2) single instancing #SANchat
mfarhat welcome @the_saltworks, what interests you in compression? #SANchat
Mike_Davis System resources are an interesting variable on design...dedupe is ram IO heavy, compr is CPU heavy. Optimizing for both=hard. #SANchat
the_saltworks @mfarhat what interests me most w/compression is the misinformation about it. I'm quite familiar w/ it in all its forms #SANchat
Mike_Davis @the_saltworks yep, SIS can help, but most video repositories don't store files rendantly...maybe in home-shares. #SANchat
Mike_Davis @the_saltworks ...lossy is interesting. Have some work in that area. lots of non-visual info that can be optimized in these files.. #SANchat
the_saltworks @mfarhat I prefer to speak of data reduction in terms of inter- and intra-file techniques. #SANchat
Mike_Davis @the_saltworks block dedupe is both inter and intra, so we way dedupe eliminates redundancy, compr uses math to predict patterns. #SANchat
mfarhat thanks @the_saltworks, can we dispel some of the misinformation about compression today? what are some common myths you hear? #SANchat
the_saltworks @Mike_Davis It turns out quite a lot of MM content can be (and is) distributed in lesser quality/format than the original content #SANChat
Mike_Davis @the_saltworks the video workflow is complex, and transcoding is part of day-to-day life using special tools at workflow level #SANchat
the_saltworks @Mike_Davis yes, in fact one of the beauties of modern dedupe is that it successfully blended the two #SANchat
the_saltworks @mfarhat one myth is that you can't further compress already compressed formats (e.g. JPG). Of course you can via lower quality #SANchat
Mike_Davis DV/raw capture formats are easy to compress in stor, h264/VP7 distribution formats are hard to compress further...very efficient. #SANchat
Mike_Davis @the_saltworks We have JPG lossless compr that will deliver 30-60% savings, partly because it knows the file format. #SANchat
the_saltworks @mfarhat and for those who cannot (or simply don't want to) compromise quality…well, I was blindsided this year with another way #SANchat
LiemNguyen "@mike_davis So when would you not want to compress data? #SANchat"
the_saltworks By far one of the most interesting and promising methods of intrafile compression I've seen in recent yrs comes from @balesio #SANchat
Mike_Davis @LiemNguyen Compression has overhead, so anything transactionally sensitive can feel pain. DB for example. #SANchat
the_saltworks That a biz has found a smarter more compact way to write existing file formats threw me a curveball. I didn't think it was possible #SANchat
Mike_Davis @the_saltworks There's no magic here; applying lossy compression/resize to MSoffice docs. #SANchat
LiemNguyen @mike_davis @mfarhat Speaking of docs, any good resources you can point to for more info? #SANchat
the_saltworks @Mike_Davis yep, if only all formats were highly efficient. Sadly most SW devs create crappy inefficient file formats #SANchat
mfarhat @the_saltworks, there is always room for higher efficiency :) #SANchat
the_saltworks @Mike_Davis you meant lossless right? I'd hate to see lossy office document compression. ;-) #SANchat
Mike_Davis @the_saltworks The main problem here is users indiscriminantly pasting 2MB JPEG images into their PPT. This is where Balesio wins. #SANchat
rootwyrm What about ways of detecting whether data is/should be compressible at time of write prior to disk commit? #SANchat
Mike_Davis @the_saltworks Re Balesio, it is lossy. the images are being resized. "visually lossless"=lossy. #SANchat
mfarhat @rootwyrm compression is almost always implemented as a post-process operation, as is the case with the DX Storage Compression Node #SANchat
rootwyrm @mfarhat Exactly. I suppose the question is: why? Isn't it faster to do a boolean operation on the data while it's in write cache? #SANchat
Mike_Davis Not much research in compression last 20yrs. Our Chief Scientist drafted a new book here: http://t.co/Dj9ODneN #SANchat
mfarhat @rootwyrm, write speeds are not impacted -- files are compressed (or not) based on user defined policies and timing #SANchat
iSCSIKing Great conversations today about compression. Thanks @mfarhat and @mike_davis for hosting today. #SANChat
rootwyrm @mfarhat Sure - I'm wondering why folks aren't doing avoidance earlier in the line though. Is there significant CPU impact there? #SANchat
DellTechCenter RT @iSCSIKing: Great conversations today about compression. Thanks @mfarhat and @mike_davis for hosting today. #SANChat
DennisMSmith RT @iSCSIKing: Great conversations today about compression. Thanks @mfarhat and @mike_davis for hosting today. #SANChat
LiemNguyen RT @iSCSIKing: Great conversations today about compression. Thanks @mfarhat and @mike_davis for hosting today. #SANchat
Mike_Davis @rootwyrm avoidance? #SANchat
LiemNguyen And join @mike_davis and @mfarhat next month, Dec. 7, for a followup #SANChat on #deduplication! #SANchat
the_saltworks @rootwyrm would be an interesting solution. Would be better if SW devs simply learned to write more efficient formats #SANchat
rootwyrm @mike_davis Yup; tag data early while in write cache and then avoid the post-write check for compressibility. #SANchat
mfarhat @rootwyrm, the cost of write I/O is generally higher than the cost of the compressible space, left uncompressed for a short period #SANchat
AlisonatDell huge thank you to @mike_davis and @mfarhat for the compression chat! looking forward to 12/7 when we talk dedupe! #SANchat
rootwyrm @the_saltworks Write.. more.. efficient formats? But, how can you not love XML with embedded binary? ;) #SANchat
iSCSIKing RT @LiemNguyen: And join @mike_davis and @mfarhat next month, Dec. 7, for a followup #SANChat on #deduplication! #SANChat
the_saltworks btw, on a briefing at the moment. I will take a look at this stream again shortly and comment #SANchat
rootwyrm @mfarhat Sure, but from an op standpoint, at post-proc you're doing disk read, cache load, CPU test, metadata write, correct? #SANchat
LiemNguyen @the_saltworks Thanks for joining us on #SANchat today!
mfarhat @rootwyrm, the operations needed to determine which data to compress and when are the same whether they are done at write or later #SANchat
mfarhat @rootwyrm, post-process compression allows those operations to be done at an optimal time, without impacting system performance #SANchat
rootwyrm @mfarhat Ah, so to an extent it does come down to compressibility testing one way or the other, and not always headers and such. #SANchat
mfarhat @rootwyrm DX compression is very flexible, users select which file types to compress and when, and with which compressors to do so #SANchat
mfarhat @liemNguyen @the_saltworks @rootwyrm thank for participating today, looking forward to continuing the discussion #SANchat

About the Author: Gina Rosenthal