“That’s one small step for man, one giant leap for mankind.”
Many of us are familiar with Neil Armstrong’s famous statement, marking one of mankind’s greatest scientific achievements of the 20th century.
Forward to the 21st century and that statement still holds true. This time, for a scientific accomplishment that we believe eclipses the moon landing: the completion of the Human Genome Project (HGP). Here’s why.
To give you an idea of the project’s magnitude, it took 13 years and some 18 countries to identify between 20,000 to 25,000 genes, and determine the sequences of 3 billion chemical base pairs that make up the human DNA – according to Explorable. While there are recent studies that dispute this figure and have pegged the count of human genes at under 20,000, the point here is: The large scale collaboration efforts to complete this project is to ultimately achieve one thing, and that is to rid the world of the tyranny of disease.
Even Hollywood’s in on It
No, this isn’t a zombie apocalypse waiting to happen, of a biological experiment gone wrong like you’ve seen in The Walking Dead or World War Z. On the contrary, it is a pivotal breakthrough in mankind’s existence that has lead to the discovery of disease genes, paving the way for genetic tests and biotechnology-based products.
Citing a CNN story, we’re sure some of you have heard of Angelina Jolie’s double mastectomy in 2013 and how she had her ovaries removed in March 2015. But why? Genetic testing revealed that she was a carrier of the breast- and ovarian-cancer gene, BRCA1. A decision, though hard, that would reduce her cancer risk by a great deal.
Revving DNA Sequencing
The rise of DNA sequencing can be partially credited to a stark drop in the cost of whole genome sequencing, from US$100 million per human genome to between US$1,000 and US$3,000 today. Of course, we do need to consider the cost of analysis after genetic testing is completed, and that number can stretch to US$20,000. But that brings me to my point. Affordability for all-not just Hollywood celebrities. Affordability is a dream for scientists in this field. For one, they can now stretch funding budgets to take on more experimental risks and beef up their sequencing activities, pushing boundaries and gathering more research data that could lead to new discoveries.
That being said, we should all be aware that DNA sequencing functions on two engines: storage and speed. Its simple. Without enough space to store research data and the adequate speed to process this data, scientists have little means to glean insights.
Take for example, SciGenom Labs (SciGenom), a company based in Cochin, India. SciGenom focuses on molecular diagnostics, cancer treatment, and metagenomics. Prior to adopting an EMC Isilon X200 scale-out storage platform, it encountered performance reduction corresponding to storage expansion that adversely impacted the speed at which the analysis of large-scale biological data sets could be completed.
Post EMC Isilon, project tasks completion are now 40 percent faster. The lab expects to achieve reductions in the workflow times associated with analyzing, annotating, and understanding the terabytes of data generated every day by the sequencing machines.
Says Saneesh Chembakasseri, IT Manager at SciGenom Labs, “The key reason for moving to Isilon scale-out storage was to increase the performance and speed of analyzing raw data generated by DNA sequencing machines. There is no better choice in the market than EMC Isilon in providing both the needed scalability and performance for meeting the demands of DNA sequencing.”
Read the SciGenom Case Study to learn more.
Being Nimble Now a Reality
18 countries. Can you imagine the kind of coordination that went into HGP? To minimize miscommunication and mistakes, sequencing workflows not only had to be established way in advance. They also had to be nimble to adapt to changes. The only way to do so was to store and share findings seamlessly, even with massive quantities of data being exchanged. The same applies today to follow-on DNA sequencing initiatives.
Malaysia Genome Institute is another establishment that has embraced the strengths of EMC Isilon. Engaged in comparative genomics and genetics, structure and synthetic biology, computational and systems biology, and metabolic engineering, MGI has sequencing machines delivering 1 gigabyte per second of throughput. Putting it in perspective, that is an astounding 1 terabyte in under 17 minutes. MGI uses the Illumina HiSeq 2000 and Illumina MiSeq sequencing platforms for DNA sequencing, whole genome sequencing, whole transcriptome sequencing, and targeted resequencing
“The way we analyze Big Data can require millions of inputs at the same time. This involves transferring data back and forth between the storage and high-performance computing cluster. EMC can comfortably handle the high throughput required within the analysis,” says Mohd. Noor Mat Isa, Head of Genome Technology and Innovation at MGI.
Read the MGI Case Study to learn more.
A Healthier Future
The National Human Genome Research Institute discusses how individualized DNA analysis based on each person’s genome will lead to a very powerful form of predictive, personalized, participatory and preventive medicine, with the ability to learn about the risks of future illness – as seen with Angelina Jolie.
With this understanding, a new generation of more effective and precise drugs can be developed as compared to the one-size-fits-all versions available today. On how fast these breakthroughs will happen, we do not yet know. But for certain, storage and processing speed of Big Data lies at the heart of progress in the next few leaps for mankind.