Getting started with machine-generated data

By Brett Roberts with Debra Slapak

We are literally surrounded by data generated from devices and other machines—things like the phones in our pockets, vehicle sensors, the ATM at our favorite spot, cameras on the street, even the thermostats and appliances in our homes. As consumers, we benefit from insights generated when this data is analyzed and put to work for us. This, ideally, protects us or makes us more loyal to the companies that provide better experiences or outcomes for us.

Increasingly, business, government and non-profit organizations alike are generating, capturing, and analyzing massive amounts of machine-generated data to help them improve operational efficiency and customer experience. This process may look simple from the consumer side, but the reality is that transforming business and operating models using machine-generated data can be challenging. The data itself is typically a mix of structured, unstructured or semi-structured data from a wide variety of sources, and organizations often to struggle with how best to collect and analyze it.

To address these problems, Dell EMC and Splunk formed a strategic partnership to architect, test and validate solutions that combine Splunk on Dell EMC infrastructure. The work our teams do together simplifies decision-making and deployment of analytics solutions for machine-generated data, so our customers can focus on better experiences and outcomes for their customers.

Splunk is a platform for real-time operational intelligence using machine-generated data. It enables organizations to search, analyze and visualize those massive streams of machine data generated using highly-optimized IT systems and technology infrastructure—physical, virtual and in the cloud. Within just a few years, Splunk has emerged as one of the fastest-growing data-focused platforms, used by more than 75 of the Fortune 100 companies to extract value from machine-generated data. Some have called Splunk the “easy button” for analytics, because it quickly collects and analyzes all varieties of data whether it’s big data, fast data, the Internet of Things (IoT) sensor data, cyber-security streaming data, or sentiment analysis data from social media. Many attribute Splunk’s rapid success to its simplified end-to-end platform, which enables users to collect data from anywhere with its universal forwarding and indexing technology, as well as its ability to search and analyze data using “schema on the fly” technology, all resulting in the delivery of real-time insights and accelerated time to value.

To support the performance and evolving storage demands associated with creating actionable information using machine-generated data, Splunk requires powerful and flexible infrastructure that:

  • Provides processor and memory configurations based on Splunk recommendations (Splunk has a great document on this for different deployment needs.)
  • Enables a flexible, scale-out capacity consumption mode
  • Includes data services like data reduction (deduplication or compression) and encryption
  • Delivers cost-effective and optimized tiered storage for hot, warm, and cold data
  • Is optimized and validated by Splunk- to meet or exceed pre-determined reference hardware requirements

Let’s look at an example.

One of the world’s largest logistics companies recently embarked on a data journey to take control of fast, diverse and large amounts of machine-generated data. The company has planes, trucks, scanners, and warehouses, all creating enormous amounts of data, as much as multi-TBs per day. In this competitive industry where minutes matter, the risk of not harnessing data for a multitude of insights can mean the difference between success and failure. With so many machines generating data, capturing and leveraging that data can be massively complex.  Here is where the Splunk platform has delivered the power, flexibility, scalability and speed they need to tackle these challenges.

Splunk is an important half of the equation. The other half is ensuring that the infrastructure running Splunk will optimize Splunk operations in this environment. This means having a correctly sized configuration to support multi-TB-per day ingestion, with a scale-out architecture that grows as Splunk use cases expand and as data ingestion grows. Using Splunk on optimized, Splunk-validated infrastructure that provides powerful data services and cost effective tiering, our customer is now well on the journey to proactive insights that will drive their business farther and faster.

The figure below summarizes the key requirements for Splunk-optimized infrastructure.

Deploying Splunk on proven architectures that have the attributes shown in the above figure helps Splunk run efficiently and scale easily as Splunk usage evolves in an organization. This is where Dell EMC comes in. Dell EMC’s portfolio of technologies are a proven landing spot for Splunk workloads. To see many of the documented solutions that have been implemented over the past year, visit the Dell EMC partner page on Splunk.com. The strength of the partnership has led to the development of jointly validated solutions for Splunk. These solutions meet or exceed Splunk performance benchmarks, based on their documented reference hardware. The solutions (linked below) have been configured for all types of deployment needs and use cases. With these solutions, organizations reduce complexity and risk associated with do-it-yourself solutions and speed time to value and insights in Splunk deployments.

Deploying Splunk on Converged Infrastructure with Dell EMC Vblock540

Deploying Splunk on Dell EMC Scale-Out Hyper-converged Infrastructure

If you refer back to the checklist above, you’ll find that the Dell EMC | Splunk solutions cover the requirements listed: Proper processing and computer sizing, scale-out architecture and cost-effective tiering coupled with highly advanced data services.

Machine generated data is everywhere and has tremendous potential value. Don’t miss out on the chance to capitalize on it. Dell EMC solutions for Splunk are ideal for getting started and, as you scale, you’ll be confident knowing the solutions will scale with you.

About the Author: Brett Roberts

Brett Roberts is a technologist who is passionate about solving business challenges using data analytics. He is currently a solutions specialists with Dell Technologies focusing on helping customers drive business relevance by understanding, deploying and optimizing their Data Analytics and AI solutions. Brett carries a number of certifications and is a co-host of a community podcast and blog that explores the trends and technologies in the Analytics and AI space. He has a Masters in International Management and an MBA from the University of Maryland. Brett currently resides in Boston, Massachusetts. You can find his blog at www.bigdatabeard.com.