March #SANchat Transcript – All about Big Data

Hot topic for 2012?? Two words come to mind…Big. Data.

Anyone who is anyone in the technology industry knows that big data is one of the biggest trends of the year. Of course we could not resist jumpin’ on Twitter and talking about it.

Want to extend a huge thanks to our Big Data experts Craig Warthen, Logan McLeod, Barton George and Gina Rosenthal for joining our discussion today. I hope that those participating felt that it was helpful and insightful.

We were joined by many well-known Big Data tweeps such as Stu Miniman, Mike Hoffa, Shel Isreal, and Mike Fishman. The official questions this month were:

  • How would you define big data?
  • What is not big data?
  • Do you have to be a big company to have big data problems?
  • Any examples of how small companies can use big data?

Lots of discussion on these questions, and really interesting side questions popped up. For example, How does Hadoop play in to #bigdata? Are there companies using Hadoop in VMware environments that are protected? If so, what methods? Check out the transcript for the discussion and lots of links, and leave a comment if you have an answer or a related question.

The best tweet of the chat has to go to Barton George:

Of course in Texas, we dont call it “Big Data” we just call it “Data” 🙂

You can find the full transcript below. Be sure to follow us on Twitter so that you stay up to date on the upcoming SANchats and tweet us if you have any follow up questions/comments! Join us in April as we talk about end to end technology solutions!

dell_storage We’ll be hosting a #SANchat all about Big Data today at 9am CST! Be sure to join us for the discussion: http://t.co/E8pW01OX
gminks RT @dell_storage: We’ll be hosting a #SANchat all about Big Data today at 9am CST! Be sure to join us for the discussion: http://t.co/E8pW01OX
AlisonatDell get ready to hear from our big data experts, @bigpapabigdata, @loganmcleod, @barton808, and @gminks! 8 minutes!! #SANchat
dell_storage RT @AlisonatDell: get ready to hear from our big data experts,
@bigpapabigdata, @loganmcleod, @barton808, and @gminks! 8 minutes!! #SANchat
Meesh_Says RT @dell_storage: We’ll be hosting a #SANchat all about Big Data
today at 9am CST! Be sure to join us for the discussion: http://t.co/E8pW01OX
barton808 All logged in and ready to go! 🙂 #SANchat
loganmcleod tap tap tap This thing on?  Yay Bigdata! #SANchat
barton808 Is it ok if i talk about DAS given this is #SANchat? 🙂 #SANchat
BigPapaBigData Yes, DAS is a big help particularly behind Hadoop #SANchat
gminks hey I’m here too! Y’all ready to talk abt Big Data? #sanchat
barton808 Join me for a #SANchat TweetChat at: http://t.co/RFbYBWZo #SANchat
shelisrael #SANChat for big data junkies is about to start with Dell’s @BartonGeorge.
gminks Join me for a #sanchat TweetChat at: http://t.co/YDKpl6F8 > use http://t.co/YDKpl6F8 to join the convo! #sanchat
stu @barton808 @gminks interested to hear Dell’s network view on #bigdata – it’s not SAN but OK for #SANChat < I wrote http://t.co/zdCcaX5j
BigPapaBigData Join me for a #SANchat TweetChat at: http://t.co/DGe9SPnV on #bigdata #SANchat
gminks this chat is for anyone who is interested the topic of #bigdata #sanchat
gminks hey guys could you introduce yourself? #sanchat
iSCSIKing RT @gminks: Join me for a #sanchat TweetChat at: http://t.co/YDKpl6F8 > use http://t.co/YDKpl6F8 to join the convo! #sanchat
barton808 Howdy, im the dir of mktng for Dell’s Web|tech vertical.  I focus on companies that use the internet as their platfrom #SANchat
loganmcleod @stu Reading the article real quick.  #SANchat
gminks hey @stu & @shelisrael ! #sanchat
gminks RT @stu: @barton808 @gminks interested to hear Dell’s network view on #bigdata – it’s not SAN but OK for #SANChat < I wrote http://t.co/zdCcaX5j
AlisonatDell i’m alison and i work in storage social media!! excited to learn more about #bigdata! #SANchat
BigDataClub RT @BigPapaBigData: Join me for a #SANchat TweetChat at: http://t.co/NYP50G3R on #bigdata #SANchat
loganmcleod Hi SANchatter’s..  Logan McLeod… I work in our CTO office and help plot our cloud technology strategy
& new tech R&D. #SANchat
stu #SANchat hi I’m an analyst w @Wikibon – watching the intersection of #bigdata and infrastructure
barton808 This might be helpful, its part of a glossary we put together.
It focuses on the data tier eg big data etc http://t.co/jLqgXumO  #SANchat
gminks @stu #sanchat is vendor neutral by design, so we’re more talking tech than Dell this am.
gminks @loganmcleod how are you reading Stu’s article! it is link packed! #sanchat
gminks @loganmcleod you are a #speedreader #sanchat
BigPapaBigData My Name is Craig and I’m in the Dell Solutions Group storage team.  I’ve been focused on helping customers address their #bigdata #SANchat
chriscastellani #SANChat Anyone read the new IDC report on #BigData? Any insights to share? http://t.co/ZNrbUdDt
gminks Since @barton808 is already going nuts with links – how would you define big data? #sanchat
loganmcleod @gminks #speedreader. #beentheredonethat #SANchat
barton808 More reference stuff:
heres a summary from the last Hadoop summit w/links to a bunch of interviews http://t.co/eiNGKCdJ  #SANchat
gminks RT @chriscastellani: #SANChat Anyone read the new IDC report on #BigData? Any insights to share? http://t.co/g67W0ZOE #sanchat
BigPapaBigData The IDC report discussed a initial step of supporting #bigdata with archival type platforms that can scale. #SANchat
stu @chriscastellani #SANchat Wikibon also published a Big Data market study http://t.co/vkRQOOZi
BigPapaBigData @chriscastellani The IDC report was what I expected to see, but
I was surprised by the low market penetration of the #bigdata SW #SANchat
loganmcleod Not surprised on low market penetration.  High complexity with implementation and Big data is a means to an end.  #SANchat
gminks Interesting: IDC defined big data as1:the system has to collect
over 100TB of data, 2. data sets to be growing at a rate of 60% and #sanchat
BigPapaBigData @chriscastellani The report showed only about 2% of the market
implementing analytics SW over the next few years. #SANchat
coolsport00 @gminks I would say – lots of data requiring much horsepower in resources & mgmt? #sanchat
gminks #. to be deployed on “scale-out architecture”
–> do people agree? #sanchat
barton808 Forrester estimates that firms effectively utilize< 5%of available data since the rest is too expensive to deal w/. #SANchat
gminks So back to a definition – does anyone want to try to define big data? #sanchat
loganmcleod Architecturally, I’ve seen success in both SAN based implementation & DAS based architectures.  #SANchat
barton808 Forrester says Bigdata is new cause it lets firms affordably dip into that other 95%. #SANchat
VirtualHoffa @gminks Even non big data related applications need to start looking hard at scale out architectures #SANchat
BigPapaBigData @loganmcleod Yes, I guess they are not counting existing data warehouses in that number. #SANchat
BigPapaBigData @gminks I try to define #bigdata as the data, particularly machine-generated and then there is the ecosystem around that data. #SANchat
gminks RT @VirtualHoffa: @gminks Even non big data related applications need to start looking hard at scale out architectures  #sanchat
gminks RT @coolsport00: @gminks I would say – lots of data requiring much horsepower in resources & mgmt? #sanchat
loganmcleod Datasets larger than can be managed with traditional db mgmt tools, driving insight into trends & the previously unknown. #SANchat
loganmcleod @barton808 Agree. #SANchat
barton808 Of course in Texas, we dont call it “Big Data” we just call it “Data” 🙂 #SANchat
chriscastellani RT @barton808: Of course in Texas, we dont call it “Big Data” we just call it “Data” 🙂 #SANchat
BigPapaBigData Machine-generated data, event generated data, etc is
#bigdata.  They are all used for more trending and require billions of records #SANchat
VirtualHoffa @gminks I think of #bigdata – data could be gathered/generated
throughout normal business process, but was unanalyzed in the past. #SANchat
gminks rt @barton808 Of course in Texas, we dont call it “Big
Data” we just call it “Data” 🙂 #SANchat #sanchat
barton808 Besides Variety, the other two axis of Big Data are Volume and Velocity.  #SANchat
VirtualHoffa But it’s #bigdata now because it has value. Analysis can be applied the data and business value can be derived from that analysis #SANchat
barton808 @VirtualHoffa yep thats the 95% of unanalyzed data that Forrester cites. #SANchat
loganmcleod @barton808 VVV #SANchat
chriscastellani Since this is #SANchat: which Big Data uses cases are SANs the best fit for? #SANchat
BigPapaBigData Don’t forget Volatility in #bigdata. #SANchat
mike_fishman #sanchat Big data and scale out are not synonymous.  Scale out is a solution –  Bigdata is a challenge ..um .. opportunity
jayfry3 RT @barton808: Of course in Texas, we dont call it “Big Data” we just call it “Data” 🙂 #SANchat [Of course.] 😉
gminks RT @chriscastellani: Since this is #SANchat: which Big Data uses cases are SANs the best fit for? #sanchat
BigPapaBigData @mike_fishman Good point #SANchat
Dome9 RT @jayfry3: RT @barton808: Of course in Texas, we dont call it “Big Data” we just call it “Data” 🙂 #SANchat [Of
course.] 😉
gminks hi @mike_fishman welcome to #sanchat
barton808 @mike_fishman Id say scale-out is an architecture that is well suited to support the “opportunities” of big data #SANchat
ZertoCorp listening in to #SANchat interesting stuff…
mike_fishman #sanchat  Hi, I’m Mike and I am crashing the big data party. I design BD solutions for the other 800lb gorilla.
gminks @mike_fishman so can you explain the diff between big data and scale out plz in 140 chars #sanchat
BigPapaBigData I like all the new database technologies coming out to help manage #bigdata.#SANchat
coolsport00 RT @barton808: @mike_fishman Id say scale-out is an architecture that is well suited to support the “opportunities” of big data #SANchat <+1
gminks @BigPapaBigData is volatility a defining factor for big data? #sanchat
VirtualHoffa @chriscastellani TBH it depends on the architecture of the application containing / analyzing the #bigdata.  SAN isn’t always best #SANchat
coolsport00 How does Hadoop play in to #bigdata? #sanchat
mike_fishman #sanchat BD is a collection of structured or unstructured information “big” is relative but IMO it exceeds normal OLTP or OLAP capabilities
barton808 Data is the currency of the Net, its whats monetized &when aggregated, parsed &made accessible, where the value lies 4biz &individs #SANchat
KongYang Cool #SANChat on #BigData going on. Hi I’m Kong and I’m a tech-a-holic. What’s a vacation without some tech goodness 🙂
loganmcleod Big data technology itself is rapidly transforming.  Bunch of innovators, more every day.  Lots of change in the next couple years. #SANchat
mike_fishman #sanchat Scale out is a storage architecture that deploys parallel storage nodes and com;ute to deliver high tput and bandwidth -howdido?
iSCSIKing RT @KongYang: Cool #SANChat on #BigData going on. Hi Im Kong and
Im a tech-a-holic. Whats a vacation without some tech goodness 🙂 #sanchat
gminks Hi @kongyang & @ZertoCorp welcome to  #sanchat
barton808 @coolsport00 Hadoops a great platform for aggregating & processing big data.  It can also analyze but thats not its core strength. #SANchat
stu @barton808 Data is the raw material – information/insight is what needs to be extracted using #bigdata tools #SANchat
BigPapaBigData @coolsport00 Hadoop is like a ETL tool that brings all the sources together and then sorts through it. Structured and unstructured #SANchat
VirtualHoffa @gminks Hadoop actually helps eliminate the reliance on SAN equipment for #bigdata – can use local storage on commodity servers #SANchat
coolsport00 @BigPapaBigData @VirtualHoffa Both good answers! #bigdata #SANchat
VirtualHoffa @stu @barton808 And that analysis of the raw data is where the real value comes from.  Just having the
#bigdata means nothing 😀 #SANchat
coolsport00 @barton808 Thx #SANchat
mike_fishman @coolsport00  #bigdata
#sanchat Hadoop is a DW technology that is designed to leverage parallel processing – it is good for big data tasks
BigPapaBigData @VirtualHoffa Hadoop is still not mature enough to be the primary storage location. #SANchat
gminks OK – not sure we have settled on a good definition of big data. It still feels cloudy 🙂 So…what is NOT big data? #sanchat
iSCSIKing Lots of great info about #BigData this morning on  #sanchat
barton808 @mike_fishman I see Hadoop and DW as separate.  Hadoop can integrate with a DW but it can also act in place of one. #SANchat
loganmcleod It’s sitting in your RDBMS? #notbigdata #SANchat
mike_fishman #sanchat @gminks  Great question – what is NOT big data?
edwsonoma RT @gminks: rt @barton808 Of course in Texas, we dont call it “Big Data” we just call it “Data” 🙂 #SANchat #sanchat
BigPapaBigData @gminks Defining BD is like defining cloud.  All depends on who you are talking to, and what they are trying to accomplish. #SANchat
stu RT @gminks: OK – not sure we have settled on a good definition of big data. It still feels cloudy 🙂 So…what is NOT big data? #sanchat
gminks RT @loganmcleod: Its sitting in your RDBMS?  #notbigdata <haha #sanchat
zertojjones #sanchat general question for #BigData..Are there companies
using Hadoop in VMware environments that are protected? if so, what methods?
storagebod @mike_fishman @gminks Big Data is all your data, everything else
is a subset of Big Data. Your Big Data is different to my Big Data #sanchat
VirtualHoffa @mike_fishman Personally, I think any dataset that is not
gathered, or analyzed, with the intent of extracting additional value #SANchat
BigPapaBigData @gminks What Bigdata isn’t? Potentially all data and potentially nothing. #SANchat
coolsport00 @BigPapaBigData @gminks Does BD nec mean volume? Agreed it’s
relative.. #SANchat
gminks RT @BigPapaBigData: @gminks What Bigdata isnt? Potentially all data and potentially nothing. < oh COME ON. #sanchat
BigPapaBigData @VirtualHoffa Agreed. Data that is not #bigdata is the data you are not interested in analyzing. #SANchat
mike_fishman #sanchat Yes,  Hadoop can run in a virtualized env. AND can run on SAN, DAS or hydrid – Virt is a good way to leverage un-used resources
BigPapaBigData @gminks Depends on where you think you can extract value.  If you think there is value in all your data, then it is all your data #SANchat
gminks RT @zertojjones: #sanchat general question for #BigData..Are there companies using Hadoop in VMware environments that are protected? if so, what methods?
iaflash #iaflash #. to be deployed on “scale-out architecture”
–> do people agree? #sanchat http://t.co/pw53xmux
BigPapaBigData One of the challenges that defines #bigdata is how to handle some of the datasets that have exteme characteristics.   #SANchat
mike_fishman #sanchat So it’s not big data unless I specifically plan to mine it?  intent doesn’t define a noun.  it’s still bigdata – still make a sound
BigPapaBigData @mike_fishman That is why it is up to the user to define their #bigdata.  I don’t think anyone else really can. #SANchat
gminks @storagebod is that a def of what big data is or is not? #headspinning #sanchat
barton808 @mike_fishman I agree w/you.While Forrester implies it must be mined to be bigD i belive it can be defined by Vol,Velocity,Varitey #SANchat
gminks ok – last question then we’ll need to wrap up #sanchat
gminks Do you have to be a big company to have big data problems? #sanchat
mike_fishman @barton808 Agree – Mining and BI are ways to LEVERAGE big data to advantage #SANchat — lol go for it @gminks
mike_fishman @gminks ahahahahah   ..sorry,  funny question.  #sanchat
VirtualHoffa @gminks You know the answer to that is a resounding NO 😛 #SANchat
coolsport00 RT @VirtualHoffa: @gminks You know the answer to that is a resounding NO 😛 #SANchat <Absolutely not
BigPapaBigData @gminks Not at all.  I see really small companies with the same issues as the big guys. #bigdata #SANchat
mike_fishman RT @gminks: Do you have to be a big company to have big data problems? #sanchat <- nope, BIg is always relative
BigPapaBigData RT @VirtualHoffa: @gminks You know the answer to that is a resounding NO 😛 #SANchat
barton808 @gminks Size of co has nothing to do w/need to leverage BigD, 10 person web startups can have mountains o’Data #SANchat
gminks @mike_fishman I thought you would like it. You know I’m always abt the humor #SANchat
coolsport00 @gminks I would come close to saying it’s worse cuz of their size they don’t think they need to manage as well as Enterprise #sanchat
gminks OK so followup – any exps of how small companies can use big data? #SANchat
VirtualHoffa @coolsport00 As well as not having the knowledge of even how to gain value out of their existing data sitting there doing nothing #SANchat
BigPapaBigData @coolsport00 I see them having challenges getting analytics expertise. #SANchat
mike_fishman #sanchat hmm.  Small companies can and should still leverage big data – and who says it needs to be “their” data?
mattwbaker @BigPapaBigData @coolsport00 – Bingo! (MR)ETL; adding a few new steps 2 old proc 2 gather & make relevant previously untapped data #sanchat
coolsport00 @BigPapaBigData @VirtualHoffa  Definitely…concur #SANchat
BigPapaBigData I look at #Bigdata in a maturity model form.  Store it, optimize it, manage it, analyze it, make use of it. #SANchat
mike_fishman @gminks SaaS and Iaas are some alternatives available to small companies with #bigdata challenges.  #SANchat
gminks RT @BigPapaBigData: I look at #Bigdata in a maturity model form.  Store it, optimize it, manage it, analyze it, make use of it. #SANchat
mattwbaker That said, shouldn’t we be talking abt #BigInsights vs. #BigData – then we can talk abt things that are revolutionizing analytics #SANChat
gminks Ok guys, we have to officially wrap up, but keep the conversation going! Anyone want to share what you are working on now? #SANchat
barton808 Big Data is the new Cloud: It represents the next not-completely-understood got-to-have strategy http://t.co/CQXJt1ZJ  #SANchat
mike_fishman #sanchat @gminks  Thank you all for a wonderful, lively, insightful twitter discussion today.
gminks @mattwbaker maybe next month’s SANchat can be on #biginsights vs #bigdata — hmmmmmmmmm #SANchat
loganmcleod @mattwbaker Explain your BigInsights thoughts for the crowd.. 🙂 #SANchat
loganmcleod @barton808 Yay Shiny objects! #SANchat
BigPapaBigData I’m launching a Big Data Solution in the coming months.  I hope you like it #bigdata #SANchat
gminks @mike_fishman thank you for joining Mike! #SANchat
barton808 This was my first twitter chat, it was a lot of fun 🙂 #SANchat
gminks I’ll be at the #ATXVMUG tomorrow and at #Interop — hope to see some of you at one of those events #SANchat
loganmcleod Lots of fun at the … #SANchat
mattvogt Dang it, missed #SANchat again 🙁
coolsport00 @mattvogt And was good stuff too! #SANchat
JoeBugBuster My thought exactly: RT @mike_fishman SaaS and Iaas are some alternatives available to small companies with #bigdata challenges. #SANchat
gminks @barton808 thanks for joining, also big thanks to @loganmcleod
& @BigPapaBigData #SANchat
BigPapaBigData SXSW in Austin is kicking off the Music festival today.  2000+ bands.  Bruce Springsteen is the keynote. #SANchat
gminks @mattvogt rats! We’ll have the transcript posted soon….#SANchat
gminks @BigPapaBigData wow you are working #SXSW? #SANchat
mattvogt @gminks thanks! Don’t think I’ll ever make a 7am #SANchat 🙂
barton808 To end w/: you’ve heard about elevator pitches well here is our 90 sec Big Data  _escalator_ pitch http://t.co/rWMyWjP1 🙂 #SANchat
AlisonatDell @mattvogt sorry, matt! i’ll post a few days early next time! SANchat
gminks @mattvogt we will try better next month. right @alisonatdell SANchat
BigPapaBigData I wouldn’t call it working! #SANchat
loganmcleod RT @barton808: …elevator pitches well here is our 90 sec Big ata  _escalator_ pitch
http://t.co/wFr016oO < IS AWESOME. 🙂 #SANchat
mattwbaker @loganmcleod – It’s simple, folks r looking for new ways of gaining insights (value). Size is just part of it – maybe least import #SANChat
BigPapaBigData This was good.  Thanks veryone. #SANchat
gminks ok everyone, thanks for coming. If you have a suggestion for a ANchat topic plz let @dell_storage know! #SANchat
BigPapaBigData RT @mattwbaker: – Its simple, folks r looking for new ways of aining insights (value). Size is just part of it -maybe least import SANchat
loganmcleod RT @mattwbaker: Its simple, folks r looking for new ways of aining insights (value). Size is just part of it – maybe least import SANchat
mattwbaker @loganmcleod -Start by focusing on the desired outcomes, the nputs (data) & tools(cool stuff) come along for the ride #BigInsights #SANChat
coolsport00 @jaslanger @mattvogt Gina and/or Allison tweet it. It’s a tweet hat if you will with the #sanchat hash
ZertoEric Enroute to #atxvmug. Come see @ZertoCorp as were interested in iscussing #virtualization and data protection. #SANchat
stu @gminks @barton808 thanks for the #SANchat – I’m presenting on bigdata at Interop, looking for customer successes to illustrate trends
ZertoCorp RT @ZertoEric: Enroute to #atxvmug. Come see @ZertoCorp as were ierested in discussing #virtualization and data protection. #SANchat

About the Author: Alison Krause