I had the opportunity to talk with Jon Rooney, Senior Director, IT Solutions Marketing, from Splunk a couple of weeks ago. It was a great chance for me to know more about Splunk and of course I had to ask him his thoughts on Big Data. He was kind enough to allow our conversation to be a part of my Big Data Conversation series.
A little background about Splunk though Jon’s voice… Splunk helps you make sense of machine data and machine data is the largest and fastest growing component of Big Data. The most under used data comes from the massive amounts of data from applications, devices, servers, network end points and are often under-used because of how difficult is can be to capture, store and analyze using outdated. Our Big Data story is about real-time machine data. We keep your systems up and running and we keep you more secure.
EB: How does your company define Big Data?
JR: Splunk wouldn’t define it differently then anyone else. We believe that the jumping off point to Big Data is volume, velocity, and variety. All the data that is too unwieldy to put in to traditional databases and is difficult to keep up with. The business press would discuss Big Data with Hadoop and state it was all about dumping together all your e-commerce and company transactions, and develop sentiment analysis about what people wrote on Twitter and product reviews. This is the human generated part of Big Data but the machine generated part of Big Data is actually the bigger portion of that and the harder to manage at scale. If you look at what people were doing with that data like pattern recognition, you can do that through batch but we focus on real-time data. Yes, we have that historical piece but It is much more valuable to do while it is happening then doing it post-mortem which is the traditional way of doing it.
EB: Do you feel the majority of organizations associate Big Data with Hadoop?
JR: I don’t think our customers do but the broader business and tech media, in the past 6 – 8 years, use “how does Amazon know what to recommend you” and “how does the CDC know that there are flu infections based on what they see on twitter”. Those are examples that ground Big Data vs “how do you look at millions of transactions through an API end point to see response time”. These are also Big Data examples and what Splunk does.
EB: How do you see Big Data changing in the future?
JR: People over time, as it becomes normalized, will see the scale of “big” change. The goalpost on what “big” means will move. People will remove the requirement that it is Big Data if you can’t cleanly fit in to a relationship database. Right now if you have to put it in to a NoSQL database, it is Big Data but that is not necessarily true. Right now there is a tight coupling between NoSQL databases and Big Data and I think that will change just as architectures change. You need to have the solution fit your architecture better and not because it handles petabytes of data. It now becomes another storage strategy that isn’t solely driven by volume, velocity, and variety. There are other architectural considerations that can help you make a decision.
EB: What is the biggest myth about Big Data?
JR:There are a lot. One of them is that not many people have figured it out and that there are only a handful of businesses that are driven by Big Data. There is the myth that people over estimate the sophistication of analysis done in Big Data, everyone thinks that everyone is doing what Amazon is doing when instead people are doing simple correlations.