Can you have your data in Hadoop, and current too?

Ed. Note: This post was authored by Bill Brunt, Product Manager, Dell Software

Bill has worked with Oracle databases since 1988, and has presented several times at Oracle Open World and its predecessor.  He has worked with the SharePlex replication solution for Dell as well as a customer, helping hundreds of customers increase their system availability and deploy scalable infrastructures. His experience with Oracle includes database administration, architecture, data warehousing, Oracle E-Business Suite and application development.

I’ve been involved with SharePlex since 1999, first as a customer and then joining the team at Quest in 2001. In the early days of SharePlex, the most prevalent use was to offload reporting. Data warehousing was just beginning to get traction from the early work of Ralph Kimball and Bill Inmon. The idea of providing a database with reporting capabilities became the rage. Everyone had a solution. The storage vendors, most notably EMC were providing business continuity volumes or BCVs with their TimeFinder product. This was a very popular approach because it came from a respected solution provider EMC on some very popular infrastructure. What quickly emerged from this was the fact the data was only current the moment it was created.  The volume would then have to re-synced some regular interval to be refreshed and be somewhat current.

It seems that history often repeats. Conversations with administrators of Hadoop clusters are reporting that users like the analytics capabilities of Hadoop but receive complaints that they always have to wait for the data to show up there. Sqoop or the utility to move data from SQL to Hadoop. Oraoop is a plug-in to Sqoop which will assume responsibility for the ones it can perform better than the Oracle manager built into Sqoop.  This approach still moves data in batches. Enter SharePlex Connector for Hadoop. It provides near real time replication from Oracle to Hadoop. Like its counterpart providing Oracle to Oracle replication, it does so with very low overhead on the source database. Let’s take a closer look.

SharePlex starts by reading the online redo logs of Oracle and identifies inserts, updates and deletes. These messages are transported as before but instead of being posted to an Oracle database, they are posted to JMS Queues. These can either ActiveMQ or OpenMQ. The SharePlex Connector for Hadoop picks these up and then applies to Hadoop. In near real time, data is available in the Hadoop cluster. With SharePlex, administrators can offload data in real time from Oracle, allowing data to integrate into Hadoop more efficiently to get analyzed or transformed. Customers and users no longer have to wait for data to arrive to begin their analysis. This speeds up the process significantly. Can you have your data in Hadoop and current too? With the SharePlex Connector for Hadoop you can.

Sarah Vela

About the Author: Sarah Vela

Sarah is the Chief Blog Strategist for Dell Technologies. Born in New York and raised in New England, she has been living and working in the Austin area for over 20 years, but she knows that doesn't make her a true Texan. She joined Dell in the spring of 2011, left briefly for another company, but realized her mistake and returned in November of 2019. Sarah has five kids, two dogs, two cats, and no free time.