It’s puzzling to me how discussions around Big Data seem to be the Big Thing for the BI and data warehouse communities, but not so much for storage and IT, to the point where “Big Data” seems to be equated with the analytical approach and models that are used.
Why is that? Doesn’t the term Big Data mean lots and lots of data, all of which would need some way to efficiently manage and retain it?
Outside of some programs that have massive impact and change management aspects, the reality is that Big Data activities don’t usually start in IT, but in the business, often in the same groups that now manage BI and data warehouse activities. They start with a curious analyst taking a fresh look at the data they are acquiring, to answer a business question. Open source applications like Hadoop can be downloaded and run against these data sets using existing infrastructure, to gain new insights from data they have. These early activities help establish the value and support the business case for incorporating more big data usage into operations.
IT – and storage – becomes important once the data volumes and compute demands are overrunning existing capacity. As with other innovations that start ‘on the edge’, making this mainstream requires developing a formal plan and supporting infrastructure, which brings IT in as a partner. It’s at this time that the data retention and management needs must be addressed, which also means bringing these extreme data sources under policy management for regulatory and organizational compliance.
Even if IT is ‘late to the party’, in many respects the party isn’t really going – the full value of big data isn’t realized – until they are an active participant in these programs.
Business stakeholders can speed their time-to-value for big data by engaging IT early on in their analytics activities. This initial activity supports expansion but doesn’t take an organization directly into leveraging big data to run their operations. That requires a more robust architecture and infrastructure that includes and optimizes both the compute and storage needs to support big data analytics. The good data management practices that IT brings help make big data use a trusted part of the operations of the organization, while enabling it to scale more efficiently.
How does IT get involved from the start to establish the right data management and infrastructure? The answer is different in most organizations. How might it work in yours?