Big Data and Little Data: from Zynga to Moneyball

Farmville, the online game, is a pretty simple game, but behind it is a lot of sophisticated information technology.  Every day Zynga, the company that produces Farmville, runs extensive statistical analysis of reams of data (Zynga has one of the largest data warehouses in the world and generates over 15 TB of new data every single day).  Zynga’s goal is to drive player actions that improve financial conversion (i.e. player paying real money for elements of the game) and player retention. To accomplish this Zynga uses the results of continuous data analyses of player actions to test, iterate, and fine tune features in their games.  As such, Zynga is a company on the forefront of the fast growing field of “Big Data Analytics,” which is the analysis of large-scale data for predictive intelligence.  Indeed, Zynga has even referred to itself as an analytics company masquerading as a gaming company.

Another, more mainstream, example of a company heavily using analytics for competitive advantage is the Swedish car company, Volvo.  Volvo collects terabytes of data from a multitude of embedded sensors in their cars in the field, from their CRM systems, from dealerships, and from their factory test beds.  These various data streams are combined and analyzed to yield early predictive information, for example, on manufacturing defects that haven’t even shown up yet.  All vertical industries can benefit from analytics, some more than others.  For example, McKinsey & Co. estimates that retail chains can increase their margins by 60% based on the best practice use of big data analytics.  A pioneer in this field has, of course, been Wal-Mart, which was doing Big Data analytics long before it became sexy.

So what is the ominous sounding Big Data really and how is it different from, say, “normal sized” data?

Big Data refers to huge amounts of data, hundreds of terabytes, even petabytes of information.  It is usually unstructured data and consists of data sets that may be unrelated to each other, i.e. data from a variety of independent streams (such as twitter, social media, traditional CRM, surveys, demographic data, defect data etc).  This is different from traditional data sets which are often relational. Another key aspect of Big Data analytics is analytic velocity, or put more simply, rapid, almost real time analysis of the data.

Big data analysis usually breaks traditional data base and analytics process and systems.  First, the data sets may be too big, second they aren’t relational, third, they need extremely rapid analysis.  To accommodate these needs a new industry has sprung up based on emerging technologies such as Hadoop MapReduce, the R statistical language, and new high performance infrastructure solutions such as parallel multi-processing, high speed networking, fast I/O storage (including emerging flash based storage).  Furthermore, Big Data analytics is requiring a new class of skilled worker: the data scientist, or even the data artist.  Some companies have now appointed Chief Analytics Officers.

Dell has been a solution provider in the field of “Big Data” analytics even before it became a buzzword.  Our server infrastructure is used at some of the largest gaming companies in the world; Dell networking switches based on high performance Force 10 technology are standard in many large scale big data installations.  In addition to infrastructure, Dell has partnered with leading software vendors such as Cloudera for Hadoop technology.

But you don’t have to be part of the petabyte club to get the benefits of analytics. Gaining value from information is not limited to big data sets. Indeed while there is a great deal of industry buzz for the phrase “Big Data” (big data = big bandwagon), the reality is that most organizations can glean a lot of insight from “normal sized” data sets (i.e., a few Terabytes) or even “little data.” Take for example the recent movie, “Moneyball”  which is based on a true story. It tells the story of a man named Bill James, who analyzed statistical data from baseball games looking for patterns, and based on this analysis, came up with winning strategies for the Oakland A’s baseball team.  The data sets that Bill James analyzed were incredibly small compared to the data sets found in true big data situations or even normal sized data warehouses, but nonetheless, they yielded critical insights that helped the Oakland A’s go on a winning run.

So it is not about big data, it is all about big insights.  And you can get lots of big insights from even small amounts of data if you do intelligent analysis.  For example, smaller organizations can analyze CRM and accounting data to glean sometimes counterintuitive insights, for example on who their most profitable customer segments or marketing actions really are, so they can focus their efforts appropriately.

To help meet the needs of organizations looking to gain insight from their “normal sized” data, Dell offers a range of infrastructure, services, and solutions.  For example, Dell’s storage solutions, based on the Fluid Data Architecture, enable the right data to be available at the right time for efficient and rapid analysis.  Dell provides a number of business intelligence consulting services and has partnerships with leading analytics vendors such as OracleMicrosoft and SAP.

Dell’s overall goal is to help CIOs realize the full potential of their title: Chief Information Officer rather than just being the Chief Infrastructure Officer.

About the Author: Praveen Asthana