Accelerating the Dell EMC Partnership with the ‘New’ Cloudera

The platform design paradigm from the early days of Hadoop has been to co-locate compute and storage on the same server, which requires expanding both in tandem as your Hadoop cluster grows. This is an acceptable approach for workloads that need to expand compute and storage simultaneously. But, as Hadoop gained mainstream adoption, enterprises started finding workloads where the need for storage outpaced the need for compute. And now, after a decade of big data, enterprises are finding that historical data sets, though accessed less frequently, still need to be easily accessible. This has brought forth new data architecture concepts, as many enterprises look to deploy solutions with independent scaling of compute and storage, plus the option to leverage object storage (in addition to HDFS storage) for Hadoop.

Dell EMC offers leading edge file and native HDFS storage product Dell EMC Isilon and distributed object storage product Dell EMC ECS. Since our partnership with Hortonworks and Cloudera began in 2015, we have been engaged in joint engineering and validation efforts to bring these enterprise shared storage solutions to both Hortonworks Data Platform (HDP) and Cloudera Data Hub (CDH).

These on-going efforts have proven critical in delivering differentiated shared storage solutions that encapsulate the concept of a consolidated data lake that scales data and compute independently, simplifies data management with non-disruptive growth from 10s of TBs to 10s of PBs in a single name space, delivers the flexibility to leverage HDFS and/or object storage, and makes it economical to store all of your data in a single place.

A Renewed Commitment

With the merger of Hortonworks and Cloudera, as well as Cloudera’s new streamlined Quality Assurance Test Suite (QATS) process for certifying both CDH and HDP with hardware vendors, we are excited to announce an accelerated partnership with Cloudera in validating and certifying both CDH and HDP with both Isilon and ECS.

“As the Hadoop landscape and storage requirements evolve, we’re excited to partner with Dell EMC to bring to market solutions backed by its leading edge unstructured data storage offerings like Isilon and ECS,” said Nadeem Asghar, VP of Solutions and Partner Engineering at Cloudera. “Dell EMC shares our commitment to ensuring our customers can always stay ahead of industry and technology trends and we look forward to delivering solutions to our customers for years to come.”

What Does This Mean For You?

This new investment strengthens the Dell EMC and Cloudera partnership allowing us to:

  1. Continue to support our existing joint customers on existing and future hardware and software releases.
  2. Bring shared storage model at scale with innovative and fully validated end-to-end platforms to support the growing Hadoop ecosystem.

Today, Dell EMC Isilon has been validated with HDP 3.1 and CDH 5.14. These solutions are supported by Dell EMC and Cloudera and will continue to be supported, via a joint support process. This process involves triaging the issue with the Hadoop solution, regardless of where it was discovered, and directing the issue to appropriate teams.

Over the course of next few months, we are contracted to work jointly with Cloudera to get Isilon certified through QATS as the primary HDFS store for both CDH (version 6.3.1) and HDP (version 3.1). In the same timeframe, we also plan to get Dell ECS certified through QATS as the S3 object store for both CDH and HDP.

What’s Next?

Beyond this, we plan to launch new joint Hadoop Tiered Storage solutions that enable customers to use Direct Attached Storage (DAS) for hot data and Shared HDFS Storage for warm/cold data within the same logical Hadoop cluster, simultaneously delivering extreme performance and economic scaling. We are also working closely with Cloudera product teams to align the Dell EMC Isilon and ECS product roadmaps with Cloudera’s product strategy for Cloudera Data Platform (CDP), the new Hadoop distribution that combines the best of breed components from both CDH and HDP.

Finally, Isilon’s capability as a data lake that can manage data for several Hadoop distributions simultaneously enables us to offer phased migration services from CDH or HDP to CDP. This simplifies the process and significantly minimizes business risk in migrating to the new Hadoop distribution. At Dell EMC, we plan to launch these migration services as CDP becomes available for on-prem deployment.

You can find more information about Data Analytics and Hadoop solutions built using Isilon here and Apache Spark on Isilon here. For the technically inclined, you can find technical details on Hadoop with Isilon here. If you have questions or feedback on Dell EMC’s offerings in Hadoop, please reach out to your local Dell EMC account executive.

About the Author: John Shirley

John Shirley is Vice President of Product Management for Dell Technologies' Unstructured Data Solutions (UDS) covering all file, object, and streaming data across the Enterprise Storage portfolio. Covering products such as PowerScale, ECS, ObjectScale, DataIQ, and Streaming Data Platform. He has over 15 years of Product Management and Product Strategy experience with data storage companies including Seagate, Symantec (Veritas) and Dell EMC. In previous roles at Dell he led Product Management for Compellent, EqualLogic and Hyper-converged infrastructure. John holds a B.S. in Computer Engineering from the University of Minnesota.