Data Lake: Platform for Business Transformation

When we engage with clients to help them identify where and how to leverage big data for business value, we frequently use the Big Data Business Model Maturity Index (BDBM). This helps organizations understand how effective they are at leveraging data and analytics to power their value creation processes.

Big Data Business Model Maturity Index
Big Data Business Model Maturity Index

Applying the BDBM can help an organization identify how it should enact changes to people, processes, and technologies to enable the creation of analytic insight that drives its top-level strategic initiatives.  Organizations that adopt this approach can utilize advanced analytics to couple new sources of customer, product and operational data, optimizing key business processes and uncovering new monetization opportunities.

However from an IT perspective, what does this look like?  The traditional data warehouse just can’t support these new data and analytic capabilities.

Well, the time is right for organizations to embrace a data lake as the data management platform for advanced analytics and predictive insight. A data lake not only provides a repository for the collection of all sorts of structured and unstructured data, both internal as well as external to the organization, but it also enables data science teams to self-provision an analytic sandbox where they can rapidly ingest new data sources, ascertain their value and uncover new, more accurate predictors of business performance.

Modern Data Lake Architecture
Modern Data Lake Architecture

The data science team needs an environment where they can quickly test new data sources and analytics models without having to go through the laborious, multi-month data warehouse integration process.

And once the data is loaded into a data lake, think “load once and analyze multiple times” – across multiple analytic use cases.

Mapping Data Sources to Analytic Use Cases
Mapping Data Sources to Analytic Use Cases

The above chart maps the data sources – and the relative value of those data sources – to the analytic use cases in order to prioritize the data loading roadmap.

A data lake also provides a benefit to organizations that are looking to free up expensive data warehousing resources by offloading the ETL processes.  This allows those processes to take advantage of the inexpensive, scale-out, natively parallel Hadoop environment.

And ultimately who knows how the data warehouse might be transformed as technologies such as HAWQ deliver more of the value of SQL and Business Intelligence (reports and dashboards) to the Hadoop data lake environment.

With these systems in place, organizations can efficiently store and analyze their data to surface the insights that help them monetize data opportunities. These advancements through the phases of the BDBM enable the metamorphosis into a truly data-driven business.

As more and more organizations embrace the data lake approach, I couldn’t be more excited to watch the results.

About the Author: Bill Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business” and “Big Data MBA: Driving Business Strategies with Data Science”, is responsible for setting strategy and defining the Big Data service offerings for Dell EMC’s Big Data Practice. As a CTO within Dell EMC’s 2,000+ person consulting organization, he works with organizations to identify where and how to start their big data journeys. He’s written white papers, is an avid blogger and is a frequent speaker on the use of Big Data and data science to power an organization’s key business initiatives. He is a University of San Francisco School of Management (SOM) Executive Fellow where he teaches the “Big Data MBA” course. Bill also just completed a research paper on “Determining The Economic Value of Data”. Onalytica recently ranked Bill as #4 Big Data Influencer worldwide. Bill has over three decades of experience in data warehousing, BI and analytics. Bill authored the Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements. Bill serves on the City of San Jose’s Technology Innovation Board, and on the faculties of The Data Warehouse Institute and Strata. Previously, Bill was vice president of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of “actionable insights” through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-defining analytic applications. Bill holds a Masters Business Administration from University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science and Business Administration from Coe College.