How to Contextualize Your Data

Data intelligence can transform businesses—but only with the right context. Find the correlations by walking your data's links and edges, node by node.

By Michael Shepherd, Distinguished Engineer at Dell Technologies

In Iron Man 2, protagonist Tony Stark manipulates a whole assortment of cool-looking holograms, zooming in and out of interfaces—all of which seem to dance to the tune of his fingers. Iron Man doesn’t just sit there at a terminal staring at boring 2D columns of data. He is immersed in data, he consumes it and discovers a new periodic element he calls Vibranium.

Aided by today’s technologies such as augmented reality, this kind of data manipulation won’t be relegated to the realm of science fiction: It will become our everyday reality. Given that data is the currency that powers our economic engine, that moment will come not a moment too soon.

Context Is Everything

Many firms are already drowning in data. Soon, they’ll feel like they’re drinking water from a fire hose, with 212,765,957 DVDs worth of data being generated every single day by 2025.

While most enterprises have a firm grasp on data analytics, many still struggle with departmental silos: Marketing is off doing their own analysis; the supply chain team is handling things on another end; manufacturing is whipping up its own charts. Such data silos are the second-highest barrier to better capturing, analyzing, and acting on data, according to a Forrester Consulting study commissioned by Dell Technologies.

Fifty-seven percent of businesses that are battling data silos say these silos are caused by internal systems that do not communicate. Each group is trying to figure out the elephant just by touching its ears or tail, but nobody can guess the right picture because nobody sees the whole picture. Equally important, nobody understands the relationship between the tail and the ears to even understand that they’re looking at an elephant. In short, they’re missing context.

Drowning in the Data Lake

Not understanding context can be costly. Enterprises waste precious hours and storage recreating the same datasets over and over again with “data cubes” simply because the data isn’t responding swiftly enough to queries and the business doesn’t know if someone else has already created the exact same query with the same joins. They’ll often find it easier just to spin up another dataset.

Even if companies don’t want to create a data model from scratch (which is time-consuming in itself), they’ll still have to spend time tracking the data they need, and that can take months. It is no surprise that data scientists spend less than 10 percent of their time actually mining data for patterns. Sixty percent of their time goes into the drudgery of finding, aggregating, and cleaning data sets.

Potentially the biggest problem related to today’s data volumes is that enterprises lose the opportunity to gain deeper insights. If you knew that driving from Boston to New York would take you twice as long on a Friday evening, you’d choose some other time to make the trip. Making informed decisions is not just about data volume, it’s about the relationship between disparate sets of data. This is why it’s so important to also consider the data between the seams—hidden metadata in the links and connections (also called edges) that answer questions and bind other data sets together.

And that’s what Graph provides: a single interfacing layer that can connect hundreds upon thousands of federated data sets across different silos. The need to rethink data is finally taking hold. In one to three years, 61 percent of respondents in the Dell Technologies-commissioned study hope to integrate data across multiple end-points to create a single view of data.

Parsing Data Relationships

GraphQL is a query language for APIs, with an easy-access interface, that binds hundreds of different silos together and routes them all through a single contract layer that’s intuitive to use and easy to visualize (giving IT the freedom they need to move data without losing connections to the data).

Whereas a traditional database has an X-Y axis and rows and columns of numbers. GraphQL utilizes nodes, and edges that connect these nodes to reveal relationships. Beyond visualization, you can query datasets to derive deeper insights into data relationships in the ecosystem and then self-explore definitions of each element without having to find an expert in each domain.

The power of following the context and correlations (and not just the individual data points) translates across industries. Take the example of Industry 4.0 (smart manufacturing). Industry 4.0 feeds on a lot of technologies working together: cloud computing, mobile devices, wearables, IoT platforms, machine learning algorithms, geolocation technologies, advanced sensors, among others. They’re all ingredients that get tossed into the data soup pot or data swamp as some call it. Understanding how they all interact with each other is crucial in realizing the full benefits of Industry 4.0. In essence, we need a single view of all the moving parts and an understanding of how each of these technologies feeds the other.

Walking from Node to Node

Too often, we don’t know what we want from data. By the time we realize the problem, it’s too late for a redo—so we settle. GraphQL lets us create an environment where we can explore and arrive at useful answers, even if we might not know the right questions upfront.

By leaning on the world’s first navigable data system, we can wave farewell to static data representation. Users can now “walk the edges” to navigate between nodes and zoom in and out across domains. This new intelligent data fabric will enable scientists to truly solve the DNA of digital entities by analyzing their many different components and how they interrelate.

It’s an important dawning. Misunderstanding is often explained by something being “taken out of context.” Data savvy businesses are now seeing that context is everything and game-changing insights are often hidden in the seams.