Why DNA Sequencing Eclipses the Moon Landing

“That’s one small step for man, one giant leap for mankind.”

Many of us are familiar with Neil Armstrong’s famous statement, marking one of mankind’s greatest scientific achievements of the 20th century.

Forward to the 21st century and that statement still holds true. This time, for a scientific accomplishment that we believe eclipses the moon landing: the completion of the Human Genome Project (HGP). Here’s why.

DNATo give you an idea of the project’s magnitude, it took 13 years and some 18 countries to identify between 20,000 to 25,000 genes, and determine the sequences of 3 billion chemical base pairs that make up the human DNA – according to Explorable. While there are recent studies that dispute this figure and have pegged the count of human genes at under 20,000, the point here is: The large scale collaboration efforts to complete this project is to ultimately achieve one thing, and that is to rid the world of the tyranny of disease.

Even Hollywood’s in on It

No, this isn’t a zombie apocalypse waiting to happen, of a biological experiment gone wrong like you’ve seen in The Walking Dead or World War Z. On the contrary, it is a pivotal breakthrough in mankind’s existence that has lead to the discovery of disease genes, paving the way for genetic tests and biotechnology-based products.

Citing a CNN story, we’re sure some of you have heard of Angelina Jolie’s double mastectomy in 2013 and how she had her ovaries removed in March 2015. But why? Genetic testing revealed that she was a carrier of the breast- and ovarian-cancer gene, BRCA1. A decision, though hard, that would reduce her cancer risk by a great deal.

Revving DNA Sequencing

The rise of DNA sequencing can be partially credited to a stark drop in the cost of whole genome sequencing, from US$100 million per human genome to between US$1,000 and US$3,000 today. Of course, we do need to consider the cost of analysis after genetic testing is completed, and that number can stretch to US$20,000. But that brings me to my point. Affordability for all-not just Hollywood celebrities. Affordability is a dream for scientists in this field. For one, they can now stretch funding budgets to take on more experimental risks and beef up their sequencing activities, pushing boundaries and gathering more research data that could lead to new discoveries.

That being said, we should all be aware that DNA sequencing functions on two engines: storage and speed. Its simple. Without enough space to store research data and the adequate speed to process this data, scientists have little means to glean insights.

Take for example, SciGenom Labs (SciGenom), a company based in Cochin, India. SciGenom focuses on molecular diagnostics, cancer treatment, and metagenomics. Prior to adopting an EMC Isilon X200 scale-out storage platform, it encountered performance reduction corresponding to storage expansion that adversely impacted the speed at which the analysis of large-scale biological data sets could be completed.

Post EMC Isilon, project tasks completion are now 40 percent faster. The lab expects to achieve reductions in the workflow times associated with analyzing, annotating, and understanding the terabytes of data generated every day by the sequencing machines.

Says Saneesh Chembakasseri, IT Manager at SciGenom Labs, “The key reason for moving to Isilon scale-out storage was to increase the performance and speed of analyzing raw data generated by DNA sequencing machines. There is no better choice in the market than EMC Isilon in providing both the needed scalability and performance for meeting the demands of DNA sequencing.”

Read the SciGenom Case Study to learn more.

Being Nimble Now a Reality

18 countries. Can you imagine the kind of coordination that went into HGP? To minimize miscommunication and mistakes, sequencing workflows not only had to be established way in advance. They also had to be nimble to adapt to changes. The only way to do so was to store and share findings seamlessly, even with massive quantities of data being exchanged. The same applies today to follow-on DNA sequencing initiatives.

Malaysia Genome Institute is another establishment that has embraced the strengths of EMC Isilon. Engaged in comparative genomics and genetics, structure and synthetic biology, computational and systems biology, and metabolic engineering, MGI has sequencing machines delivering 1 gigabyte per second of throughput. Putting it in perspective, that is an astounding 1 terabyte in under 17 minutes. MGI uses the Illumina HiSeq 2000 and Illumina MiSeq sequencing platforms for DNA sequencing, whole genome sequencing, whole transcriptome sequencing, and targeted resequencing

“The way we analyze Big Data can require millions of inputs at the same time. This involves transferring data back and forth between the storage and high-performance computing cluster. EMC can comfortably handle the high throughput required within the analysis,” says Mohd. Noor Mat Isa, Head of Genome Technology and Innovation at MGI.

Read the MGI Case Study to learn more.

A Healthier Future

The National Human Genome Research Institute discusses how individualized DNA analysis based on each person’s genome will lead to a very powerful form of predictive, personalized, participatory and preventive medicine, with the ability to learn about the risks of future illness – as seen with Angelina Jolie.

With this understanding, a new generation of more effective and precise drugs can be developed as compared to the one-size-fits-all versions available today. On how fast these breakthroughs will happen, we do not yet know. But for certain, storage and processing speed of Big Data lies at the heart of progress in the next few leaps for mankind.

Get first access to our LifeScience Solutions

Sanjay Joshi

About the Author: Sanjay Joshi

Sanjay Joshi is Industry CTO Healthcare at the Dell Global CTO Office. Based in Seattle, he has spanned the gamut of life-sciences from clinical and biotechnology research to healthcare informatics to medical devices. A "skunkworks" engineer and informaticist, he defines himself as a "non-reductionist" with a "systems view of the world.” His current focus is a systems-level understanding of Healthcare, Genomics, Proteomics, Microbiomics, Imaging and IoT processes, and data infrastructures. Recent experience has included AI platforms, data management and instruments for Electronic Medical Records; Proteomics and Flow Cytometry; FDA and HIPAA verification and validation; Lab Information Management Systems (LIMS); Translational Genomics research and Imaging. Sanjay holds a patent in multi-dimensional flow cytometry analytics. He began his career developing and building X-Ray machines. Sanjay was the recipient of a National Institutes of Health (NIH) Small Business Innovation Research (SBIR) grant and has been a consultant or co-Principal-Investigator on several NIH grants. He is actively involved in non-profit biotech networking and educational organizations in the Seattle area and beyond. Sanjay holds a Master of Biomedical Engineering from the University of New South Wales, Sydney and a Bachelor of Instrumentation Technology from Bangalore University. He completed several medical school and PhD level courses (in Sydney and Seattle). A list of selected recent invited talks and panels: • Next Generation Bioinformatics & Biotech Conf, Oct 2019: Mumbai India, Keynote, “Time Series, Machine Learning and the Microbiome: A summary” • GratiFi Summit, Jul 2019, Seattle WA, Panelist, “AI in Biotechnology.” • 601 Club, Jun 2019, Seattle WA, Moderator, “Artificial Intelligence and the Future of Health.” • Bio2Device & Silicon Vikings, Apr 2019, Palo Alto CA, Panelist, “Digital Health.” • BioIT World West, Mar 2019, San Francisco CA, Chair and Speaker, “Streamed Postcards from the Edge: Medical Device Architectures.” • Data Day Texas, Jan 2019: Austin TX, “Morals from a Type 2 Diabetes dataset analytics journey.” • Global AI Conference, Jan 2019: Santa Clara, CA, “Medical Device Architectures: Machine Learning on Streams” • Next Generation Bioinformatics & Biotech Conf, Oct 2018: Jaipur India, Keynote, “A Machine Learning Operational Analytics Story” • EPPICGlobal conference, Oct 2018, Burlingame CA, “Digital Health Keynote Panel” • AI in Healthcare Summit, Jun 2018: San Francisco CA, Chair and Panelist, “Executive Physician Roundtable” • BioIT World, May 2018, Boston MA, Chair and Speaker, Machine Learning and Data Science track • Medical Imaging in Clinical Research, Feb 2018 San Francisco CA; Speaker “Operational Imaging in Clinical Trials.” • AI in Healthcare Summit, Jan 2018 Boston MA: Keynote Panel and Genomics AI moderator • Kaiser Permanente Machine Learning Day, Dec 2017 Oakland CA: Panelist on AI in Healthcare • Interface Summit, Oct 2017, Vancouver Canada; Speaker “Pain: can AI shine a light on it?” • MinneAnalytics HALICON; Oct 2017, Minneapolis MN; Speaker “Two use-cases and a summary: Diabetes and Communicable Disease.” • mHealth Israel; Sep 2017, Jerusalem, Israel; Speaker “AI in Health: Hope or Hype?”