The data warehousing and business intelligence space has undergone a huge transformation in the past several years whereby business users are moving away from these traditional, ‘IT bottleneck’ environments to more agile ones driven by Big Data. For example, when business users lobbied for self-service access, they got Tableau. When they pressed for data discovery, they got Endeca. What’s next? An agile, yet controlled environment to satisfy both the business and IT community. Pivotal Data Dispatch (Pivotal DD) fulfills the needs of all enterprise data stakeholders by empowering business users with on-demand access and analysis to Big Data – all under an established system of metadata and security defined by IT.
I spoke with Todd Paoletti, Vice President of Product Marketing at Pivotal to explain why Pivotal DD is the next Big Thing to hit the Big Data market.
1. Walk me through how Pivotal DD is used from the inception of a Big Data project and what issues it overcomes during a project lifecycle?
Pivotal DD is an environment that IT can establish for end users to gain on-demand access for rapid analysis to Big Data. Through Pivotal DD metadata and connectors, IT simply defines any data source to be made available for discovery and selection by end users. Data sources from Hadoop to MPP to flat-files are indexed using the system metadata catalog, and role-based access and compliance policies are defined. Once this environment is established, analytic teams can log into a simple web portal to discover the data needed for analysis, provision their own sandboxes, and set up complex analytical workflows that include data movement from heterogeneous sources. With this controlled access and preparation capability, Big Data analysis can happen rapidly, which will increase the productivity of users and usefulness of Big Data.
Pivotal DD solves many types of problems. First, it addresses time to insight. Enterprise data lives in heterogeneous silos and most of it is unstructured, creating huge challenges for IT to deliver a complete and consolidated data set in a timely manner. Second, it addresses the ability to iterate with analysis. In many cases, data scientists or analysts do not know what data sources to tap into and need access to all the data living inside the organization as well as externally to discover hidden insight. Pivotal DD provides visibility into a myriad of data sets for mix and match selection. Third, it addresses escalating IT resources needed to provision data and meet SLAs. With Pivotal DD, analysts and data scientists have self-service access to discover data from any data source, provision terabytes and petabytes of data from multiple sources onto sandboxes in Hadoop or MPP databases, and continuously iterate on demand – all without IT assistance.
2. Pivotal DD provides a consolidated view into heterogenous data sources inside and outside of an organization. What data sources are supported?
Unlimited number of structured and unstructured data sources. Pivotal DD includes native, high-speed adapters and resource monitoring for multiple platforms, including Pivotal HD and HAWQ, Pivotal Greenplum Database (GPDB), Apache Hadoop, IBM Netezza, Oracle and SQL Server. Pivotal Data Dispatch can also connect with any database through JDBC, and with most distributed file systems such as NFS.
3. One of the key issues with traditional BI and data warehousing was the lengthy process of ‘time-to- analysis’ due ETL processing and setting up data marts. Can you explain how Pivotal DD shortens the ‘time-to-analysis’ timeframe?
There is no ETL or data mart set up required. PDD provides a logical view of data across multiple sources that IT defines. From this logical view, users can easily browse and search for relevant data, create their own sandboxes, and decide what data to move into the sandbox through multistep workflows that include transformation and analytics. Users can use existing visualization and analytical tools that they know and love for subsequent analysis to further shorten the ‘time to analysis’ window.
4. Pivotal DD was productized or came to life based on a Pivotal solution that NYSE implemented for fast data provisioning under strict SLAs and compliance. Can you please describe what Pivotal DD is in terms of the architecture and technology components?
Pivotal DD is installed as a middleware service on a commodity Linux cluster with the following component services:
- Metadata and Security
- Data Discovery and Search
- Workflow Design
- Resource Management
Pivotal DD has been in production at the NYSE since 2007, provisioning millions of files and terabytes of data per day.
5. What makes Pivotal DD unique in the market over other tools?
There is no other product that provides data staging and provisioning, with self-service and access control wrapped around it. Pivotal DD is the only Big Data on-demand platform focused on enabling end-users to easily discover and analyze data under access and compliance policies set by IT.