Scientific Big Data

Cloud and Big DataWe are at a very exciting moment in history and in science. On one hand, advances in technology have enabled scalable and affordable Cloud Computing as well as the analytics of truly massive Big Data. These in turn have made viable the application of technology to the advancement of science in a significant way. On the other hand, social, political, economical and technological changes have enabled the collection and sharing of scientific data.

Scientific Big Data refers to Big Data that is collected, organized, stored, managed, and, most importantly, analyzed to enable and to accelerate discoveries in science. Examples of Scientific Big Data include nuclear research data, where the CERN Institute, the European Organization for Nuclear Research, is a major contributor; all of the data reported on the generation and consumption of all forms of energy on a global scale, where Smart Grids are a tremendous source of that data; and all the aeronautics and space data collected over the last few decades, including almost 100 years of data exposed by NASA as part of the Open Government initiative by the US government.

DNAWhile the collection of data is a necessary first step, the actual value of Scientific Big Data can only be harvested through analytics. Analytics gives us an understanding of the information being captured by the data, transforming it into knowledge and enabling informed actions to be derived. Consider, for example, the 1,000 Genome project targeted to create a large bank of genomes, providing a comprehensive resource of human genetic variation.

Even though collecting and providing access to hundreds of thousands of genomes represents a historical scientific achievement, advancement in science will only be attained through the analysis of this data guided by molecular biology researchers. Molecular biology is the branch of biology that studies how atoms bond together to form molecules and how these molecules then drive biological activity. Molecular biology aims at understanding the interaction patterns between the components or systems of a cell, including DNA and RNA. Through data analytics, researchers seek to understand the biological synthesis of proteins and how they can be tuned or regulated to prevent and cure diseases.

The ultimate purpose of Scientific Big Data analytics is to be able to lead to actions that are predictive and preventive in nature, releasing us from operating in a reactive mode, always attempting to remediate the effect of past events. Through Scientific Big Data analytics, we want to change the course of history in a positive way. By harvesting energy from new sources, a more sustainable planet will emerge. By designing more intelligent drugs, human beings will live healthier, more productive and happier lives. By foreseeing natural disasters, lives will be saved.

We are essentially at the intersection of technological empowerment in the collection and analysis of data, and the actual availability of the data itself. Consider, for example, the field of Bioinformatics, which applies computer science to biology and medicine. Bioinformatics can study biology by using technology because biological data can now be collected and stored. By the end of 2012, Next Generation Sequencing machines will be capable of generating a digital image of a genome in a matter of days and for less than two thousand dollars. This so called commoditization of genome sequencing is leading to the creation of massive amounts of genome banks that can now be analyzed at a massive scale. Without the data itself, empirical studies could not be conducted, and without technology, this massive amount of data could not have been processed in our lifetime.

Bioinformatics is paving the way to true Personalized Medicine (PM), a medicine focused on understanding every human being, as an individual, and treating their diseases in a manner tailored to their needs and to their way of living. PM will emerge as a revolution at worldwide scale with social, political and economical implications.

Big Data and Big Data analytics are indeed transforming not only the business landscape but also the way in which we do business, leaving an ever lasting impact.  Similar to the transformations that occurred during and after the agriculture and industrial ages, many industries will seize to exist, new industries will be created, and some will learn how to re-invent themselves. We want to be more than passive observers in the transformation process enabled by Scientific Big Data. We want to be an active catalyst. We want to be thought leaders and show the way towards innovation.

Technology advancement and the availability of scientific data are truly a remarkable confluence of forces that are building a pivotal moment for an era of scientific enlightenment. As an EMCer, I’m very excited to not only be part of it, but to help lead this transformation.

To learn more about Big Data and Big Data Technologies, check out my video series, “EMC Big Ideas playlist” on YouTube.

About the Author: Patricia Florissi

Patricia Florissi, Ph.D., is vice president and global CTO for sales and a distinguished engineer for Dell EMC.