A Data Lake should now be a part of every big data workflow in your enterprise organization. By consolidating file storage for multiple workloads onto a single shared platform based on scale-out NAS, you can reduce costs and complexity in your IT environment, and make your big data efficient, agile and scalable.
That’s the expert opinion in analyst firm IDC’s recent Lab Validation Brief: “EMC Isilon Scale-Out Data Lake Foundation: Essential Capabilities for Building Big Data Infrastructure”, March 2016. As the lab validation report concludes: “IDC believes that EMC Isilon is indeed an easy-to-operate, highly scalable and efficient Enterprise Data Lake Platform.”
The Data Lake Maximizes Information Value
The Data Lake model of storage represents a paradigm shift from the traditional linear enterprise data flow model. As data and the insights gleaned from it increase in value, enterprise-wide consolidated storage is transformed into a hub around which the ingestion and consumption systems work. This enables enterprises to bring analytics to data in-place – and avoid expensive costs of multiple storage systems, and time for repeated ingestion and analysis.
But pouring all your data into a single shared Data Lake would put serious strain on traditional storage systems – even without the added challenges of data growth. That’s where the virtually limitless scalability of EMC Isilon scale-out NAS file storage makes all the difference…
The EMC Data Lake Difference
The EMC Isilon Scale-out Data Lake is an Enterprise Data Lake Platform (EDLP) based on Isilon scale-out NAS file storage and the OneFS distributed file system.
As well as meeting the growing storage needs of your modern datacenter with massive capacity, it enables big data accessibility using traditional and next-generation access methods – helping you manage data growth and gain business value through analytics. You can also enjoy seamless replication of data from the enterprise edge to your core datacenter, and tier inactive data to a public or private cloud.
We recently reached out to analyst firm IDC to lab-test our Isilon Data Lake solutions – here’s what they found in 4 key areas…
- Multi-Protocol Data Ingest Capabilities and Performance
Isilon is an ideal platform for enterprise-wide data storage, and provides a powerful centralized storage repository for analytics. With the multi-protocol capabilities of OneFS, you can ingest data via NFS, SMB and HDFS. This makes the Isilon Data Lake an ideal and user-friendly platform for big data workflows, where you need to ingest data quickly and reliably via protocols most suited to the workloads generating the information. Using native protocols enables in-place analytics, without the need for data migration, helping your business gain more rapid data insights.
IDC validated that the Isilon Data Lake offers excellent read and write performance for Hadoop clusters accessing HDFS via OneFS, compared against via direct-attached storage (DAS). In the lab tests, Isilon performed:
- nearly 3x faster for data writes
- over 1.5x faster for reads and read/writes.
As IDC says in its validation: “An Enterprise Data Lake platform should provide vastly improved Hadoop workload performance over a standard DAS configuration.”
- High Availability and Resilience
Policy-based high availability capabilities are needed for enterprise adoption of Data Lakes. The Isilon Data Lake is able to cope with multiple simultaneous component failures without interruption of service. If a drive or other component fails, it only has to recover the specific affected data (rather than recovering the entire volume).
IDC validated that a disk failure on a single Isilon node has no noticeable performance impact on the cluster. Replacing a failed drive is a seamless process and requires little administrative effort. (This is in contrast to traditional DAS, where the process of replacing a drive can be rather involved and time consuming.)
Isilon can even cope easily with node-level failures. IDC validated that a single-node failure has no noticeable performance impact on the Isilon cluster. Furthermore, the operation of removing a node from the cluster, or adding a node to the cluster, is a seamless process.
- Multi-tenant Data Security and Compliance
Strong multi-tenant data security and compliance features are essential for an enterprise-grade Data Lake. Access zones are a crucial part of the multi-tenancy capabilities of the Isilon OneFS. In tests, IDC found that Isilon provides no-crossover isolation between Hadoop instances for multi-tenancy.
Another core component of secure multi-tenancy is the ability to provide a secure authentication and authorization mechanism for local and directory-based users and groups. IDC validated that the Isilon Data Lake provides multiple federated authentication and authorization schemes. User-level permissions are preserved across protocols, including NFS, SMB and HDFS.
Federated security is an essential attribute of an Enterprise Data Lake Platform, with the ability to maintain confidentiality and integrity of data irrespective of the protocols used. For this reason, another key security feature of the OneFS platform is SmartLock – specifically designed for deploying secure and compliant (SEC Rule 17a-4) Enterprise Data Lake Platforms.
In tests, IDC found that Isilon enables a federated security fabric for the Data Lake, with enterprise-grade governance, regulatory and compliance (GRC) features.
- Simplified Operations and Automated Storage Tiering
The Storage Pools feature of Isilon OneFS allows administrators to apply common file policies across the cluster locally – and extend them to the cloud.
Storage Pools consists of three components:
- SmartPools: Data tiering within the cluster – essential for moving data between performance-optimized and capacity-optimized cluster nodes.
- CloudPools: Data tiering between the cluster and the cloud – essential for implementing a hybrid cloud, and placing archive data on a low-cost cloud tier.
- File Pool Policies: Policy engine for data management locally and externally – essential for automating data movement within the cluster and the cloud.
As IDC confirmed in testing, Isilon’s federated data tiering enables IT administrators to optimize their infrastructure by automating data placement onto the right storage tiers.
The expert verdict on the Isilon Data Lake
IDC concludes that: “EMC Isilon possesses the necessary attributes such as multi-protocol access, availability and security to provide the foundations to build an enterprise-grade Big Data Lake for most big data Hadoop workloads.”
Read the full IDC Lab Validation Brief for yourself: “EMC Isilon Scale-Out Data Lake Foundation: Essential Capabilities for Building Big Data Infrastructure”, March 2016.