Scale-Out Architectures in the Age of Machine Learning

The age of machine learning continues to accelerate in 2021 as organizations rely even more on their data teams to provide insights for all levels of the business. For example, look at the Financial Services industry where data analytics provides the ability to fight fraud. One leading innovator in healthcare is delivering the capability to battle drug diversion with the use of machine learning.

Here at Dell Technologies, we are leveraging analytics for data-driven decision making from customer engagement to supply chain management. At the heart of these machine learning and analytics solutions is the ability to build a modern scale-out data consolidation architecture.

Dell EMC PowerScale has a long history of supporting the next generation of data lakes. Since our partnership with Hortonworks and Cloudera began, Dell Technologies has engaged in joint engineering and validation efforts to bring our award-winning Dell EMC PowerScale to both Hortonworks Data Platform (HDP) and Cloudera Data Hub (CDH). Today, we are announcing the release of Cloudera Data Platform (CDP) Quality Assurance Test Suite (QATS) validation for PowerScale with CDP 7.1.6. And CDP can now be deployed using PowerScale with OneFS 8.2.

Understanding Cloudera’s QATS Validation

The certification process is designed to validate Cloudera products on a variety of cloud, storage and compute platforms. Partner technologies that have been certified via the QATS program are tested and validated to comply with Cloudera’s development guidelines for integration with the Cloudera Data Platform and use the supported APIs. This validation includes:

  • Overall architecture
  • Observance of the CDP interface classification system
  • Complete integration testing
  • Compliance with Cloudera support policies and requirements, and
  • Cluster capability using real-world workloads and micro-benchmarks.

The PowerScale Advantage for Cloudera Architectures

The PowerScale OneFS scale-out network-attached storage (NAS) platform provides Hadoop clients with direct access to big data through a Hadoop Distributed File System (HDFS) protocol interface. A PowerScale cluster, powered by the OneFS operating system, delivers a scalable pool of storage with a global namespace, enabling customers to take advantage of the following:

  • No 3X mirroring with industry-leading storage efficiency
  • Independent scaling of compute and storage
  • Lower overall TCO, size matters.
  • Time-to-result — this is where businesses increase revenue
  • Multi-protocol support — save with simplicity
  • Enterprise functionality snap, replicate, backup
  • Support for multiple distributions everyone can play!
  • Security to make your compliance officer happy.
  • Ease of management

Start Building Scale-out CDP Data Lake

Now PowerScale’s capability for data consolidation can manage data for several Hadoop distributions simultaneously. This enables us to offer phased migration services from CDH or HDP to CDP. This simplifies the process and significantly minimizes business risk in migrating to the new Hadoop distribution. At Dell Technologies, we plan to launch these joint migration services as we continue through the CDP QATS validation for PowerScale with OneFS.

Find more information about deploying CDP 7.1.6 with PowerScale and Apache Spark on PowerScale. If you have questions or feedback on our Hadoop offerings, please reach out to your local Dell Technologies account executive.

 

Thomas Henson

About the Author: Thomas Henson

Thomas Henson an Unstructured Data Solutions Systems Engineer with a passion for Streaming Analytics, Internet of Things, and Machine Learning at Dell Technologies. He brings experience in Machine Learning Anomaly Detection, Open Source Data Analytics Frameworks, and Simulation Analysis. Thomas is also heavily involved in the Data Analytics community.