Introducing Greenplum’s Chorus

Silicon Valley breeds the next best million dollar idea. Many make millions by solving a simple problem or taking a very inefficient process and making it efficient. EMC Greenplum Chorus is the next best million dollar idea.

In my 15 years of working with data to solve business problems, I always knew that there was a better way, but continued with a siloed approach- spending time finding relevant data and figuring out what it means, running analysis based on my own domain expertise, and finally making conclusions based on my own interpretations. Chorus is a ground-breaking solution that removes this siloed process, and replaces it with a collaborative process to ensure organizations gain maximum value from their data.

I sat down with Josh Klahr, Vice President of Product Management for EMC Greenplum to talk about Chorus, the world’s first and only analytic productivity platform.

You have been working with data and driving analytics solutions for over 15 years. What were the compelling reasons for joining EMC Greenplum to lead their Data and Insight product team?

During the course of my career in data platforms and analytic applications I’ve become familiar with the key technical challenges that enterprises face when dealing with Big Data; I’ve also been on the business side of things, and have witnessed the huge financial and competitive rewards that can be achieved by effectively leveraging Big Data.

Most recently, at Yahoo, I was lead in developing a one of the largest data analytics platform in the world to deliver value added services to Yahoo customers as well as extract value for Yahoo’s own internal business initiatives.  From this experience and studying other Internet companies such as Google, Facebook, and Twitter deploying Big Data I learned what key capabilities were required for a winning Big Data formula.  Of all the Big Data vendors competing in this space, I truly believe that EMC Greenplum uniquely understands this formula, and we are building a unified analytics platform that brings these capabilities to the enterprise and make them just as successful as Yahoo, Google, Facebook, Twitter, etc.

You have been font and center in this shift from traditional BI to Big Data. In my opinion, BI analysts have been performing analytics on terabytes of data for years to solve the same problems around customer retention, fraud, clinical outcomes, etc. What is different with this new shift of Big Data?

Yes, BI/DW has been around for a long time, but there is a difference. Not only is there a shift in technology where we now have an economically viable solution to build a Big Data infrastructure to better optimize predictive models and open up new business opportunities, but there is also a shift in mindset when it comes approaching data analytics that involves people and processes.

Traditionally with BI, data was very controlled by IT and made available to a selected few in the business due to the traditional concerns around security, management, performance, and cost of scaling. As a result, the value of BI was siloed and could not be leveraged across the organization. If someone in Marketing built a predictive model for product recommendations, this insight and knowledge most likely was not shared with let’s say Operations where they could leverage the same skills to build a predictive model to detect fraud on the network.

With a Big Data infrastructure, the people and process barriers are removed since now we have the ability to securely manage and support any workload. And now with Greenplum Chorus, we have created the first self service, collaborative data analytics platform to facilitate the sharing of knowledge to quickly solve business problems across the organization to get the most ROI. We call this true business agility with Big Data.

In addition to agility, what additional value does Chorus bring to Big Data Analytics?

Efficiency. Chorus removes the process of having to copy data outside of the data warehouse and move the data into a specialized analytics database or desktop application so it can be analyzed in a siloed manner with inconsistent results. For example, one company I worked with had 5 different groups creating ‘Top 10 searches’. Each group had a different result set because of this siloed environment.

So with Chorus, we eliminate this inefficient approach so instead of moving data to the analytics, Chorus brings the analytics directly to the data where it is centralized and shared. For example, using Chorus, an analyst can quickly provision a sandbox, visualize the data, create and save queries – which can then all be shared and optimized through the collaborative features of Chorus.

You have been in the trenches with data and doing analysis for years to solve real business problems for customers. What are your favorite features of Chorus and why?

There are so many! But if I were to call out my “Top 3”, here they are:

1)  Self documenting or “crowd – sourced” metadata. Chorus provides collaboration around data sets generates this rich, organic metadata where user interactions around understanding data sources – what does a column mean, which values should be filtered, etc – are automatically captured and shared with the data developer community. If It Weren’t For Chorus, It Would Just Get Lost…

2)  Comprehensive Search.  Users of Chorus can search for a concept, for example “Churn”, and easily find all objects related to this concept, whether it’s a table containing a churn prediction score, a workspace on churn models, or a data developer with expertise in churn models.

3)  Rich set of APIs and integration to plug into any back end data source or front end BI or Visualization tool. We have built Chorus as a platform that supports the data developer ecosystem. This means that we need to be able to support 3rd party analytics functions, external data sets, and API-level integration with other systems (e.g. workflow or task management). The opportunity for partners to participate in this ecosystem is huge, and I am really looking forward to building out this ecosystem.

When it comes to analyzing data and building data models, many users prefer to code or use their favorite tools such as SAS, MicroStrategy, R. Is Chorus a replacement for these tools? Or is it just a data preparation tool? Where does Chorus exactly fit in the Big Data Analytics process?

Great question! Let me first say that Chorus is NOT a BI tool, nor is it an analytics engine. So we are not replacing SAS, R, Tableau, etc. Chorus is really a platform that’s intended to make it easier for organizations to better leverage their investments in Big Data, in analytics, in BI – to drive better collaboration, faster “time-to-insights”, and ultimately to generate more value. So Chorus is essentially providing an ecosystem that wraps around these tools and greatly increases the ROI for the enterprise.

There are Chorus Beta Customers. Can you provide some anecdotes or feedback from these customers to validate Chorus’ value proposition?

Chorus has received a very enthusiastic response from our Beta customers, and their feedback has both validated the strategy to data and also provided us with key guidance on where to invest going forward. A few things that stood out from Beta customers:

  • Positive response to the collaborative aspects of the application, specifically around the ability for a data science team to collaborate around code (SQL queries or analytics functions) in a shared workspace that supports code versioning and commenting.
  • Access to quick and easy data profiling and visualization in order to understand the “layout” of available data sources. This ability to quickly assess the value and contents of data sources allows the data development team to quickly discover and utilize the data elements that they need.
  • Search functionality – there has been a universally positive response to the Chorus search functionality. Whether it’s finding users, data sets, or workfiles the search box makes it dead simple to find what you are looking for.
  • Organic data dictionary – in the more active environments there is a very rich set of metadata that is built up as the Chorus community consumes and collaborates around data sets. Without needing to depart from their core jobs – data analysis – the Chorus users are able to generate a very rich and informative metadata stream that is extremely valuable to the entire user community.

Thank you very much for this very thorough and insightful discussion of of EMC Greenplum Chorus.  Good luck in your new role and we look forward to hearing more on future developments at EMC Greenplum.

About the Author: Mona Patel