As Big Data continues to demonstrate real business value, organizations are looking to leverage this high value data across different applications and use cases. The uptake is also driving organizations to transition from siloed Big Data sandboxes, to enterprise architectures where they are mandated to address mission-critical availability and performance, security and privacy, provisioning of new services, and interoperability with the rest of the enterprise infrastructure.
Sandbox or experimental Hadoop on commodity hardware with direct attached storage (DAS) makes it difficult to address such challenges for several reasons – difficult to replicate data across applications and data centers, lack of IT oversight and visibility into the data, lack of multi-tenancy and virtualization, difficult to streamline upgrades and migrate technology components, and more. As a result, VCE, leader in converged or integrated infrastructures, is receiving an increased number of requests on how to evolve Hadoop implementations reliant on DAS to being deployed on VCE Vblock Systems – an enterprise-class infrastructure that combines server, shared storage, network devices, virtualization, and management in a pre-integrated stack.
Formed by Cisco and EMC, with investments from VMware and Intel, VCE enables organizations to rapidly deploy business services on demand and at scale – all without triggering an explosion in capital and operating expenses. According to IDC’s recent report, organizations around the world spent over $3.3 billion on converged systems in 2012, and forecasted this spending to increase by 20% in 2013 and again in 2014. In fact, IDC calculated that Vblock Systems infrastructure resulted in a return on investment of 294% over a three-year period and 435% over a five-year period compared to data on traditional infrastructure due to fast deployments, simplified operations, improved business-support agility, cost savings, and freed staff to launch new applications, extend services, and improve user/customer satisfaction.
I spoke with Julianna DeLua from VCE Product Management to discuss how VCE’s Big Data solution enables organizations to extract more value from Big Data investments.
1. Why are organizations interested deploying Hadoop and Big Data applications on converged or integrated infrastructures such as Vblock?
Big Data and Hadoop practitioners have been using commodity hardware with DAS to manage and analyze data – they like the ability to add easily server nodes as a serviceable unit when the business demands more resources. While “commodity” sounds inexpensive, customers may be buying excessive servers for more storage capacity because they cannot scale out storage separately – Hadoop requires three or more copies of data residing within the internal drive of a server unit. There are even greater concerns around security, data protection, portability and more with the DAS approach – for this reason and many more, VCE added Isilon scale-out NAS with native HDFS integration to its Big Data solution to address these issues.
Vblock serves as a foundation for organizations to drive greater value from Big Data by providing a pre-integrated, pre-validated platform across the enterprise. In our 2013 customer survey, 76% of VCE customers are either already using or considering Vblock systems for Big Data analytics and applications. VCE’s goal is to enable organizations to get more from data, application and analytic investments, whether they are unstructured or structured data from existing databases and applications. Recently, VCE is receiving customer requests to address “enterprise-readiness” so that Hadoop and new types of advanced analytics can co-exist with and take advantage of enterprise applications, enterprise data warehouses and the rest of information value chain.
2. According to Gartner, VCE leads the integrated infrastructure market. What benefits does Vblock provide to organizations deploying Big Data and Hadoop?
The recurring theme is to get better results by converging data investments whether it is Big Data or other data sets. Instead of creating shadow IT and data silos, the Vblock based model makes it easier for enterprise IT to augment the existing environment and run advanced analytics and develop real-time responses in a footprint that can be standardized. Many customers have Tier 1 applications along with relational data such as customer, financial, inventory data, etc. on Vblock shared infrastructures.
Customers can choose EMC storage – Symmetrix VMAX, VNX or Isilon based on data formats, performance, availability, security and other requirements on Vblock systems, proven to run virtualized, mixed application workloads. EMC Isilon’s native HDFS integration is popular with organizations interested in greater data protection, support for multiple application workflows, and eliminating the need for resource-intensive import and export of data into and out of Hadoop. Additionally, EMC Isilon brings analytics or Hadoop to your data, not requiring you to move or migrate your data to Hadoop – this increases time to insight and eliminates the need to manage multiple copies of data. EMC Isilon’s integration with VMware vSphere Big Data Extensions enables organizations to virtualize Hadoop for fast deployment, operational simplicity, better resource utilization with multi-tenancy, and enterprise-class scaling and availability. Vblock customers also gain visibility and simplicity through VCE Vision Intelligent Operations software by dynamically providing a high level of intelligence to your existing management toolset.
For all the described benefits, adding big data workloads into the Vblock is a logical next step in aggregating applications and processes to drive efficiency at lower cost and risk.
3. How does an integrated or shared infrastructure approach of Vblock perform compared to DAS in a commodity server?
Commodity servers with DAS has been the best practice for Hadoop deployments due to the notion that storage sits close to CPU and you can easily add nodes for maximum performance. However, this approach does not take into account the data management processes of DAS – there is excessive movements of data coming from source systems, replication of data and then downstream of data processing that add to the time to manage and to deploy.
Decoupling the compute and storage through shared storage, like EMC Isilon, can provide equivalent and even better performance and lower CAPEX and OPEX costs in many cases. With advancements in interconnect networks and in particular fiber channel SAN connectivity speeds, the traditional gap in processing speed between server and storage is shrinking rapidly. In addition, for ultra-low latency processing, organizations can add server-side flash like EMC XtremSF so that they can create a hybrid architecture to enjoy the protection, availability and scale of shared storage model while still selectively performing low latency data processing with enterprise servers powered by server-side flash.
4. Can you provide a use case example that is best suited to take advantage of the big data and analytics on Vblock?
Let us take an example of a financial service company in the case of risk and customer analytics. The goal is for the company to provide a better customer experience, make risk-intelligent decisions and mitigate the fraudulent transactions. In order to keep pace with all customer activities and improve insights in a continuous fashion, the company needs to access a larger set of data sources including Web, trade execution data, point of sales transaction data, mobile data including locational data, social media, news feeds, along with existing customer, support, financial and risk data.
Traditionally, the infrastructures to run these disparate systems were spread out across and beyond the enterprise, presenting potential risks and challenges – added latencies, data exposure, maintenance overhead, and lack of flexible scaling options. This disparate infrastructure made it harder for IT to run Big Data and Hadoop as a service with ease of provisioning, deployment options, and migration paths from development, QA, and production – not to mention maintaining SEC compliance .
By moving to a shared infrastructure model enabled by Vblock with EMC Isilon, the company can easily develop and maintain specific services to run analytics, Hadoop processing, or other applications leveraging the data in its place- without having to move and replicate across multiple servers and storages while maintaining SEC 17a-4 compliance with EMC Isilon’s robust enterprise security options. For real-time fraud analytics, the business can access and extract information including customer transaction history, credit, and locational data against the algorithms refined off-line in Hadoop in a more secure, reliable environment and mitigate the risks of downtimes and timeouts that can be costly or incur regulatory penalties.
The same infrastructure can be used for other Big Data use cases – to deliver targeted offers and personalized experiences online to help drive “acquire and retain customer” initiative and upsell and cross-sell initiatives. IT can also run multiple distributions of Hadoop in bare-metal or virtualized implementation, while taking advantage of the secure system of records residing in EMC Symmetrix VMAX, VNX or Isilon. The firm can compare and improve experimental results using actual data without sacrificing data privacy and security as they can keep the data exactly where it already resides. More information can be found here on how VCE and EMC empower organizations to transform business.
5. What would customers need to keep in mind when migrating the current Big Data or Hadoop environment to a Vblock Systems?
We are extending the distinct Vblock experience to customers who are interested in running Big Data and analytic workloads. Vblock Systems typically arrive within 40 days or less, and within a few days of arriving on site, the Vblock systems can be up and running in the IT environment. Factory integration of industry-leading products from EMC, Cisco, and VMware not only speeds initial deployments, but also subsequent developments and testing on a standard platform. Organizations can migrate target Big Data analytic applications and data into the Vblock and allocate un-utilized resources for other applications or research and development efforts.
The standardized nature of Vblock systems make it possible for the companies to reduce deployment risk and eliminate the time and cost of testing across different environments – one of the major issues in traditional Big Data infrastructures. Furthermore, application performance is consistent through the lifecycle on each move – from development to testing to staging environments before production, making Vblock Systems ideal for the organizations to converge big data investments to transform business