Navigating a Data Lake

Here in Seattle, we have a stunning lake on the edge of our downtown called Lake Union. The lake is home to many houseboats, including the one filmed in “Sleepless in Seattle,” as well as a haven for sailboats, kayakers and sea planes – in short, a true beehive of activity!

Even though the lake can be crowded, Seattle does a great job of managing activity on the lake. Restrictions on the number of house boats, designated landing areas for sea planes, and police patrol boats all work together to help ensure that everything moves in an orderly fashion. I can’t help but think about the parallels between what happens on Lake Union on a daily basis and what is transpiring in the emerging world of what is referred to as a “data lake.”

A data lake is a repository for all kinds of data. Data can be placed in the lake through a variety of means. That same data can be consumed through different mechanisms without needing to copy or export anything. Ultimately, data lakes are an order of magnitude more scalable than existing approaches for data warehousing and business analytics. However, in order to ensure seamless, predictable and efficient capacity given the amazing rate of information growth, a data lake must above all else be built to be able to scale. As businesses learn to harness their information, data lakes and their applications take on strategic importance. The data lake must be able to enable existing applications, as well as seamlessly support new applications. It is also increasingly important to protect and backup the data lake efficiently, to ensure that it interacts with directory and security services, and be something that you can manage easily over time.

Within EMC, we at Isilon have been focusing on developing some of these capabilities. Over the last couple of years, we’ve been enhancing the OneFS operating system and collaborating with key partners to ensure that our customers can effectively manage their data lakes. If any of you grew up around lakes, you probably remember finding a solid foundation to dive from, and then once you were comfortable, climbing to higher ground and really taking a deep plunge!  We’re using the same philosophy with our approach to data lakes. We have maintained a strong footing in our traditional offerings around enterprise file applications such as archive, home directories and HPC, while expanding and building new solutions for mobile, cloud, analytics and software-defined storage.

In addition, by natively incorporating the Hadoop Distributed File System (HDFS) into OneFS, companies are now able to bring Hadoop to their Big Data rather than vice versa. HDFS allows enterprises to avoid the CapEx costs of purchasing a separate infrastructure and start getting results faster because they don’t need to spend time moving PBs of data. They can also access home directory and files shares contained in their data lakes, from virtually any mobile device using Syncplicity technology.

While other parts of EMC are focusing on complementary capabilities related to data lakes, these are just a few of the areas where we at Isilon are helping folks to successfully realize the possibilities that exist.

About the Author: Bill Richter