By Nicole Reineke, Distinguished Engineer, Dell Technologies
Data-driven businesses don’t just spring up. It takes careful planning, care, and feeding to grow into a data-driven enterprise. This commonly involves taking stock of their data, so they can access, transform, and analyze it to make better, faster decisions. In the process, they’ll inevitably identify and resolve the many different data silos within their organization.
And then they’ll realize they’ve only just scratched the surface, as they unearth even more data, hemmed in by single applications or ecosystems, stored in silos, and marooned in outlier locations. As these businesses grow, they may enter mergers and acquisitions, buy data from marketplaces, or build out their internal ecosystems. While this is critical to move the organization forward, it typically comes with new data silos to weed out or work into the fold.
These data silos will upset a business’s best efforts to become a data-ready, data-driven organization. According to the Data Paradox study, based on a commissioned study conducted by Forrester Consulting on behalf of Dell Technologies, 60 percent of business leaders with a responsibility for data in their organization describe data silos as a top barrier to better capturing, analyzing, and acting on data (making it the second highest-rated barrier).
To minimize the damage, businesses need to be prepared for an onslaught of new data silos and ready to unify their data when the need arises.
Mergers and Acquisitions
Big companies, particularly in highly regulated sectors such as finance and insurance, spend time and money creating a central source of truth to comply with regulations. This requires an exceptional level of discipline. Some of that good work can unravel when businesses grow by entering into a merger or acquisition and, in the process, acquire new and overlapping data sets across business entities.
In the Forrester study, 73 percent of senior leaders that name data silos as a barrier to data excellence place the blame for the proliferation of data silos at the feet of mergers and acquisitions. In part, this is the result of different data formats and structures. Each business may operate in the same industry, but their data is discretely theirs and organized accordingly.
This is not an indication that previous efforts at data management failed—it simply highlights that, sometimes, despite a business’ best efforts to unify their data, external circumstances create data silos. Unfortunately, bringing silos together after a merger is not a straightforward task and requires a big investment.
If the acquired organization does not have a practice of data lineage, bringing data silos together may include challenges beyond just format. They run into an issue of trust. How can they depend on the information and use the data in new projects if they don’t know its origins, particularly as the data may differ according to where and why it was created? Trust in data relies on recognizing and implementing tooling for measuring the lineage of the data, the purpose or context of how the data was generated and why, and the purpose and context of how it will be utilized.
Companies use data marketplaces to rent information to test models, build algorithms, and make informed decisions. Think of advertising and marketing companies that purchase information to enrich their understanding of customer behavior.
However, buying data from marketplaces comes with downsides. For a start, these are often discrete assets—data origin is typically implied, but proving lineage can be challenging. Second, businesses that sell information may provide data in different formats. Companies may have to transform and cleanse marketplace data before it’s used to aid decision-making processes.
For business leaders, the key issue is—once again—trust. They must understand where the data came from, its creation, and why it ended up in their organization. They need to create standards for interaction, lineage, and governance. They should know where information has been and who’s touched it. Without this information, they could be exposing themselves to undue risk. At present, 38 percent of the Forrester study respondents struggling with data silos believe data marketplaces create more silos because data lineage is often unknown
Progress in this area can be seen in public initiatives such as the Dublin Core, which sets out best-practice standards for tagging data, and the E.U., through the compliance requirements set out in the General Data Protection Regulation.
Initiatives in the technology industry are also supporting these efforts so that we can create an immutable understanding of data. One example is the Data Confidence Fabric and its standardized way of tracing, quantifying, and measuring data. By establishing the lineage of information, users have a deep understanding of everywhere data has been and whether it’s trustworthy.
Organization-wide data dictionaries provide descriptions of information and how it’s used and stored. These definitions often create a point-in-time understanding and structure that helps reduce data silos. However, new types of information are always being created. Challenges may arise as organizations adopt new technology or technology updates. Best-in-class tools needed for complex challenges may create unintentional data silos. As the Forrester study shows, this is the difficult reality for businesses struggling with data silos. Of those, 59 percent say internal systems that don’t communicate are often the cause of these silos.
Rather than stagnant definitions, organizations must continually engage in maintaining and assessing standards across the internal and external ecosystem.
Proactive Data Management
As data grows in volume and velocity, and as companies continue to transact, businesses will realize their data work is never done. However, they can embed best practices upfront so that renting data or merging with another company doesn’t send them back multiple steps. These upfront hygiene measures include creating a formal discipline to bring data together and identifying projects that glean the best value from data.
No one wants to get in the way of the business from moving forward, but they do need to plant the seeds of data discipline and be mindful of the inherent risks of adding new systems and data. This way we can ensure a fruitful harvest.