Despite the fact that the Wells Fargo fake account scandal first broke in September, the banking giant still finds itself the topic of national news headlines and facing public scrutiny months later. While it’s easy to assign blame, whether to the now-retired CEO, the company’s unrealistic sales goals and so forth, let’s take a moment to discuss a potential solution for Wells Fargo and its enterprise peers. I’m talking about data security and governance.
There’s no question that the data security and governance space is still evolving and maturing. Currently, the weakest link in the Hadoop ecosystem is masking of data. As it stands at most enterprises using Hadoop, access to the Hadoop space translates to uncensored access to information that can be highly sensitive. Fortunately, there are some initiatives to change that. Hortonworks recently released Ranger 2.5, which starts to add allocated masking. Shockingly enough, I can count on one hand the number of clients that understand they need this feature. In some cases, CIO- and CTO-level executives aren’t even aware of just how critical configurable row and column masking capabilities are to the security of their data.
Another aspect I find to be shocking is the lack of controls around data governance in many enterprises. Without data restrictions, it’s all too easy to envision Wells Fargo’s situation – which resulted in 5,300 employees being fired – repeating itself at other financial institutions. It’s also important to point out entering unmasked sensitive and confidential healthcare and financial data into a Hadoop system is not only an unwise and negligent practice; it’s a direct violation of mandated security and compliance regulations.
Identifying the Problem and Best Practices
From enterprise systems administrators to C-suite executives, both groups are guilty of taking data security for granted, and assuming that masking and encryption capabilities are guaranteed by default of having a database. These executives are failing to do their research, dig into the weeds and ask the more complex questions, often times due to a professional background that focused on analytics or IT rather than governance. Unless an executive’s background includes building data systems or setting up controls and governance around these types of systems, he/she may not know the right questions to ask.
Another common mistake is not strictly controlling access to sensitive data, putting it at risk of theft and loss. Should customer service representatives be able to pull every file in the system? Probably not. Even IT administrators’ access should be restricted to the specific actions and commands required to perform their jobs. Encryption provides some file level protections from unauthorized users. Authorized users who have the permission to unlock an encrypted file can often look at fields that aren’t required for their job.
As more enterprises adopt Hadoop and other similar systems, they should consider the following:
Do your due diligence. When meeting with customers, I can tell they’ve done their homework if they ask questions about more than the “buzz words” around Hadoop. These questions alone indicate they’re not simply regurgitating a sales pitch and have researched how to protect their environment. Be discerning and don’t assume the solution you’re purchasing off the shelf contains everything you need. Accepting what the salesperson has to say at face value, without probing further, is reckless and could lead to an organization earning a very damaging and costly security scandal.
Accept there are gaps. Frequently, we engage with clients who are confident they have the most robust security and data governance available.
However, when we start to poke and prod a bit more to understand what other controls they have in place, the astonishing answer is zero. Lest we forget that “Core” Hadoop only obtained security in 2015 without third-party add-ons, the governance around the software framework is still in its infancy stage in many ways. Without something as inherently rudimentary in traditional IT security as a firewall in place, it’s difficult for enterprises to claim they are secure.
Have an independent plan. Before purchasing Hadoop or a similar platform, map out your exact business requirements, consider what controls your business needs and determine whether or not the product meets each of them. Research regulatory compliance standards to select the most secure configuration of your Hadoop environment and the tools you will need to supplement it.
To conclude, here is a seven-question checklist enterprises should be able to answer about their Hadoop ecosystem:
- Do you know what’s in your Hadoop?
- Is it meeting your business goals?
- Do you really have the controls in place that you need to enable your business?
- Do you have the governance?
- Where are your gaps and how are you protecting them?
- What are your augmented controls and supplemental procedures?
- Have you reviewed the information the salesperson shared and mapped it to your actual business requirements to decide what you need?