Analyst firm IDC says that by 2020 there will be 44 trillion gigabytes of data in the digital universe—about 10 times the amount existing today.
So this begs the questions: What is this data about? What do I do with it? Where do I keep it all? And what benefits can I extract from it?
Big Ideas In Big Data Analytics
The growth in data (and opportunities based on this data) comes from the billions of connected devices (from mobile phones and smart watches, to connected cars and industrial machine sensors) that constitute the ‘Internet of Things’ (IoT). For example, organizations can now use data to better understand customers, to improve service or increase revenues—or analyze processes and operations to improve efficiency and reduce costs.
IDC says the overall value of the big data and analytics market has already reached $125 billion worldwide. McKinsey Global Institute estimates that effective big data analytics of open data sources from government and industry could generate over $3 trillion in value every year.
How Big Data Analytics Is Being Used Today
patterns and more
Web, mobile and social:
behavior in the
Science, health and medicine:
analytic methods to
of data to enable
Internet of Things:
growth in a machine-
generated data for
The Big Data Storage Challenge
If organizations want to derive value from big data, they first need a way to store it, preferably in an environment ready for analytics. This data storage needs to ingest data from many sources—and must also be able to cope with massive growth in a short time.
HDFS Storage for Big Data and Hadoop
Over the past decade, the popular Apache Hadoop framework has enabled big data storage and analytics for enterprise organizations. HDFS was originally architected a decade ago—a long time in technology terms. It was designed in an era of less reliable storage and slower legacy networking, and is not optimized for today’s high-performance infrastructure.
Traditional HDFS big data storage is not enterprise-grade. Its data mirroring system requires you to store three full copies of each item of data, increasing storage overhead and management cost. Standard HDFS also doesn’t support multi-tenancy or geo-distribution, and has limited disaster recovery (DR) features.
The maximum size of any storage system is determined by the scalability of its ‘namespace’. In Hadoop, the standard HDFS file system namespace is managed by a single server and is maintained in memory. The maximum size of namespace that can be managed by the central Hadoop ‘NameNode’ is limited by the amount of memory available on that server. The performance of the whole file system is therefore limited by the performance of a single server. HDFS is inefficient at handling a large volume of small files (as typically used in IoT applications) because metadata for each file needs to be stored in the memory of the NameNode server. A failure of the single NameNode server will halt all processing until it is repaired.
Object Storage—Ideal for Hadoop Analytics and Big Data Growth
Born in the cloud, ‘object storage’ is a new emerging data storage model that organizations can use to exploit the next wave of IoT and Hadoop analytic opportunities. In this storage architecture, data items are not stored in blocks, files or folders, but rather in flexibly-sized containers called ‘objects’.
Object storage is great for storing large amounts of unstructured big data, as it can scale without practical limits. It overcomes the restrictions of traditional HDFS storage, providing a virtually limitless namespace for complete scalability and simplicity of management.
Object storage has become well known as the technology behind cloud storage, with the huge growth of public cloud services like Amazon S3, Microsoft Azure Storage, as well as several on-premise options. But how do you decide which option is right for your big data strategy—and in which situations?
Big Data In The Public Cloud
Public cloud-based object storage is a popular choice for organizations when they want big data storage in a hurry—it’s easy to manage and there’s no need to buy, install and manage physical infrastructure. However, it is very easy to suddenly find you’re paying more than you planned for, as your data grows and you begin to run analytics at a regular cadence. The public cloud storage pricing models are designed to attract your data into the provider’s cloud—with the real expense being incurred each time your data is accessed or transferred.
Public cloud object storage performance for your in-house big data apps and users may be lower than it would be with on-premise storage. The public cloud is also not always an option for many organizations that need to adhere to strict compliance mandates, like government, healthcare and financial services—not to mention data residency laws.
Big Data In Your Datacenter—Private Cloud or Hybrid Cloud
As a third-generation enterprise-grade ‘private cloud’ object storage solution, EMC Elastic Cloud Storage (ECS) gives organizations an attractive alternative to complement public cloud for big data analytics, modern apps and IoT.
Object Storage For Big Data With Dell EMC ECS
Object storage capacity
is highly scalable
HADOOP and HDFS
compatibility for big
Reduce storage costs-
storage up to 65% lower
With on-premise object storage you get the scalability and simple management of public cloud—but you retain full control over the location and protection of your data. While you do need physical infrastructure, this is now low-cost industry-standard commodity hardware—integrated intelligently in a software-defined storage platform. You also get the performance of on-premise storage for your big data apps and users—with TCO economic advantages over public cloud.
Big Data Analytics with Hadoop and Dell EMC ECS
EMC ECS with integrated HCFS (Hadoop Compatible File System) allows organizations to utilize a software-defined storage infrastructure as an integrated part of their Hadoop architecture, while also providing simplicity, flexibility and rapid deployment. The built-in HDFS-compatible access in ECS makes it simple to bring analytics to all your data ‘in place’, reducing the need for complex, costly and time-consuming data migration.
With ECS, you can expand analytics across all your datacenters, to glean new business insight, identify new business opportunities, and improve time to results. ECS’s geo-distributed architecture and multi-tenancy augment Hadoop capabilities by enabling IT to analyze data anywhere in the world—without the need to migrate it to a Hadoop cluster. ECS also provides a global, centralized ‘data lake’ to access and manage big data through multiple Hadoop distributions.
ECS is an ideal storage platform to protect your Hadoop data against site-wide outages, and for multi-site analytics—supporting more advanced parallel in-place analytics across multiple datacenters.
Three Big Data Analytics Scenarios with EMC ECS Object Storage
Primary Hadoop storage
Protection of Hadoop data
Multi-site Hadoop analytics
Moving Ahead With Your Big Data Plans
Big data analytics is a capability that no enterprise organization can now afford to be without. For organizations approaching big data analytics, the choice between public cloud and on-premise object storage will likely not be ‘one or the other’—both will play a valuable role in different use cases. The important thing is—with EMC ECS you now have more big data storage options.
If your organization is looking to manage and analyze big data, and wants the reassurance and confidence of a turnkey on-premise object storage solution, you can talk to EMC for advice, end-to-end software, pre-configured commodity hardware, and expert support.
Get your organization in position to seize the opportunities enabled by big data and Hadoop analytics, with smart on-premise object storage. It’s easy to start small with a free test environment—and scale up easily as you embrace the big data possibilities.
EMC Elastic Cloud Storage Software is available to download and try free.
Learn more about big data, Hadoop analytics and EMC Elastic Cloud Storage.