The Analytics Journey Leading to the Business Data Lake

More than ever, businesses see their futures tied to their ability to harness the explosive growth in data. You may even be familiar with the Business Data Lake concept—a central repository of vast information which can be used across an enterprise to drive all business intelligence, advanced analytics and even, eventually, intelligent applications.

We, at EMC IT, are in the process of creating a Business Data Lake, and I will be sharing insights about our efforts in this blog. To start, let’s trace the vision that’s leading EMC IT and other businesses to the shores of this new data landmark.

Path to Analytics Maturity

Let’s consider business use of analytics. In today’s competitive and ever-changing marketplace, key questions businesses need analytics to answer to effectively manage operations and strategize for future growth and competitiveness include:

  • What happened? How is my business doing and what’s its performance level?
  • Why did it happen? Why did we experience high cost?
  • What will happen? Where is more potential for market share? Will my customers stay with me?
  • How can we make it happen? What can we offer by predicting a customer problem and solving it proactively – so we can ensure higher satisfaction and loyalty.

Based on these questions, Gartner, Inc. provides a maturity path for analytics. Essentially, the first two questions above are answered by traditional business intelligence–where business looks back on more operational data to describe and diagnose problems. This is mapped to descriptive and diagnostic analytics. For example, a sales analytics solution provides a view of current and historical opportunities and bookings, end of quarter reports, etc.

As data matures into information and provides optimization opportunities through advanced analytics, the business begins to look forward to predict and act on opportunities and change.

Thus, the “what will happen” and “how can we make this happen” questions are answered with advanced analytics to predict and explore and avail new opportunities for growth. For example, there is a solution available that helps sales organizations predict sales conversion and close risks of a potential customer deal. The solution collects a broad range of opportunity data and runs it through an analytical model to identify key win drivers. This is a great tool for the sales force to focus on the right deals and ensure closure to increase revenue. Thus analytics serves to predict and manage market opportunities.

This is the transition from traditional business intelligence to advanced analytics.

Along the same line, today’s applications are also moving towards more data-driven capabilities and utilizing analytics at their core. This is a major architectural shift from traditional applications using embedded analytics. For example, customer profile analytics can drive various customer-centric applications to provide more customized, profile-based interactions and outcomes.

Our Journey with Business Data Lake

At EMC, we have grown tremendously, progressing from business intelligence capabilities via our global data warehouse to the world of advanced analytics using Business Analytics as a Service (BAaaS) and EMC Connected Proactive Services (ECPS) – two large scale data stores. Along with high-performing, end-of-quarter reporting, and real-time sales analytics, we are exploring data-science-driven, advanced analytics solutions to identify new product opportunities, standardizing our product configuration in the most effective way or developing models for optimizing sales deals. Yet, we believe we are just scratching the surface on Big-Data-driven advanced analytics.

Key challenges we face in moving toward more innovation in advanced analytics areas include:

  • Siloed data assets serving pockets of analytics.
  • Current data assets on aging platforms with limited support technology and not expandable.
  • Limited consolidation of all structured and unstructured data together.
  • Limited data and knowledge sharing across analytics users.
  • Data gathering slowing innovation.
  • Insufficient governance and data quality management.

We continue to strive towards building a consolidated, more scalable data lake platform where we can empower innovation through agility and more collaboration, providing access to all enterprise, machine data, vendor data or social media data in one logical space. This will help realize our vision of a business data lake driving all business intelligence, advanced analytics and, eventually, intelligent applications via one logical data lake.
bdl2Our data lake is one logical data platform with multiple tiers of performance and storage levels to optimally serve various data needs based on Service Level Agreements (SLA). It will provide a vast amount of structured and unstructured data at the Hadoop and Greenplum layers to data scientists for advanced analytics innovation. The higher performance levels powered by Greenplum and in-memory caching databases will serve mission-critical and real-time analytics and application solutions.

With more robust data governance and data quality management, we can ensure authoritative, high-quality data driving all of EMC business insights and analytics driven applications using data services from the lake. Thus we will move towards the next level of analytics maturity with:

  • A more scalable Big Data platform with various performance levels serving deep data science innovation to real-time analytics.
  • Tools to share data and analytics quickly among groups.
  • Faster innovation through easy and self-service data availability.
  • Workflow-driven robust data governance.

Our Business Data Lake implementation is a multi-phase journey. Our first round implementation in July consolidated major large-scale structured and unstructured data for heavy-duty self-service analytics and enables the business with quick turnaround for innovative solutions. With this foundational Data Lake setup, we have played with various new technology and architecture concepts. We have made great strides with creative solutions. Along the way, we have also been faced with challenges and limitations of evolving Big Data technologies.

In my next blog, I will talk about our lessons learned from BDL program planning, technology and implementation challenges.

About the Author: Shahidul Mannan