Hadoop is Ready for Primetime: Recap of Strata + Hadoop World San Jose

Hadoop joins the ranks of Microsoft Windows and Apple iPhone as the next platform ready for applications.  The message is clear from Strata + Hadoop World San Jose 2015 that Hadoop is ready for primetime.  As we have all seen in the past from other successful platforms such as Windows and iPhone, it takes a well-constructed operating system and application development framework to prepare for success.  Windows 1.0 was a great glimpse into what would happen when the 2nd platform originally emerged, but it wasn’t successful until applications began to be created.  I remember playing with Windows 1.0 and thinking, I wish it had the ability to do X, and I wish it had Y.  And of course today, it has most any application you might need.  The same holds true with the advent of the mobile era as iPhone 1.0 built a platform with a handful of applications, but it wasn’t truly successful until the ecosystem began to build apps on top of the platform.

Enter the next generation data platform, Hadoop.  We’ve heard our customers say things over the last three years like “we’re experimenting with Hadoop” or “we have it in a lab” or “we have a few killer apps we’ve custom designed”.  But in 2015, we’re discovering trends in data and trends in data use by using the advanced toolsets the Hadoop framework brings to data.  In the financial industry, for example, we see fraud analytics and risk calculations as a common set of applications being built with the technology.  It’s now only a matter of time until an application is established that solves that challenge with fewer customizations than Hadoop has usually been known to require.

You can see Doug Cutting (The Father of Hadoop) and me speaking about this topic on O’Reilly TV:
[youtube_sc url=”https://www.youtube.com/embed/1WnGyXRanHU”]

The industry is full of change, advancement, and growth.  You could see growth in the form of attendees from last year (people are starting to get it).  You could see advancement from all of the new intellectual property brought out by the vendors (including some EMC competitors).  Good to see them joining the party.  And for change, well that was the story of the week with the announcement of the Open Data Platform.  There has been plenty said about the new Pivotal-led initiative both from supporters and adversaries.  Although I have heard a lot about the initiative this week, I’d say I am not qualified to comment on its merits.  I will instead state my opinions, which I’m known to do.  I believe in Big Data as something that will change the world.  I also believe Hadoop as a framework is still in need of an enterprise quality uplift as we transition to the application-ready nature I’ve just addressed.  I hope the ODP will be an organization that will not only provide that uplift, but will do so in a truly open way and in a way that gets all of the major Hadoop supporters on board.

At EMC, we support the industry, our customers, and we want to see the world truly made better through whichever vendor that customer chooses (we are Data Switzerland).  We hope we are delivering excellent products and solutions to that end, and believe customer choice is at the heart of those solutions.  With that in mind, we’ve augmented our Pivotal and Cloudera relationships to include Hortonworks.  After 6,172 tests required for certification of EMC Isilon against the Hortonworks distribution, I am happy to say Isilon has passed with just a handful of documented differences. This should put customers at ease when they decide to utilize Hortonworks HDP with our Data Lakes.

Shaun Connelly, VP of Strategy for Hortonworks and I discussed the certification on theCube:
[youtube_sc url=”https://www.youtube.com/embed/XLUKbXvBId8″]

We announced the HD400 node, which is fantastic!  I have found that not many companies have moved greater than 20PB into Hadoop.  Even the very large Web2.0 companies run multiple clusters none of which I have seen greater than 35PB.  This is usually a result of maxing out the namenode and is seconded by not wanting to have such a large fault domain.  I believe EMC Isilon’s 50 PB’s be PLENTY of capacity for 99.9999% of companies for many years to come.

See Sam Grocott discuss the Data Lake and our recent announcements related to the HD400 on theCube:
[youtube_sc url=”https://www.youtube.com/embed/Fe3DGHSo66g”]

Finally, a big shout out to Raeanne Marks who represented EMC at the Women of Big Data conference this week, Bill Schmarzo (Dean of Big Data) and the army of >50 EMC’ers that have joined the Hadoop revolution and made it to Strata + Haodop World San Jose this year.

We are driving many innovations with the products, solutions and choices for our customers, follow @SGrocott, @NKirsch, @KorbusKarl, @AshvinNa, @EMCBigData and @EMCIsilon to bring you the latest stories from the trenches.

Thank you!


About the Author: Ryan Peterson