Cloudera, Broad Institute Collaborate on the Next Generation of the Genome Analysis Toolkit

Built on Cloudera Enterprise with Apache Spark as the Bioinformatics Standard, GATK4 Designed to Speed Genomic Research


PALO ALTO, Calif., April 06, 2016 (GLOBE NEWSWIRE) -- Cloudera, the global provider of the fastest, easiest, and most secure data management and analytics platform built on Apache Hadoop and the latest open source technologies, today announced a collaboration with the Broad Institute of MIT and Harvard, the world’s leading biomedical and genomic research center. The two organizations are working together this year to advance the development of Broad’s next generation Genome Analysis Toolkit, GATK4.

 

Cloudera Enterprise accelerates life sciences research and drug discovery by putting real-time data into the hands of the clinicians, researchers, and providers focused on personalizing the patient experience. Building the fourth generation of GATK (GATK4) on Cloudera Enterprise and utilizing the Apache Spark distributed computing framework to speed research, the Broad Institute is facilitating better understanding of genomic sequencing, resulting in faster data exploration and ultimately empowering better clinical decisions.


Since the Human Genome Project produced the first draft sequence of the human genome in 2000, the cost of sequencing has dropped exponentially, from around $100 million USD per genome to around $1,000 USD today. Over the same period, we have seen massive growth in the storage and processing capabilities of big data technologies like Hadoop.

 

“This lower cost of genome sequencing and advancement in big data technologies means that we can afford to sequence the genome of patients very broadly and produce datasets that have never been available before,” said Shawn Dolley, industry leader of life sciences at Cloudera. “Building the next generation toolkit on Spark greatly accelerates in-memory computations and facilitates parallelism. Cloudera Enterprise expedites round-trips to access and compute data for data discovery, translating into significant reductions in R&D time. This will have a very meaningful scientific upside.”


Presently there are more than 31,000 registered users of the GATK. Broad Institute is working with collaborators to develop cloud-hosted options to expand access and facilitate usage of genome analysis tools for even more powerful insights and decision-making. Users could also more easily create best-practice pipelines and avoid duplicating infrastructures.


“Utilizing the Spark computing framework on Cloudera Enterprise gives us the ability to implement tools that were not possible in GATK3 due to their computational complexity,” said Dr. Eric Banks, senior director of Data Sciences and Data Engineering at Broad and a creator of the GATK software package. “On Cloudera Enterprise, we can now run analysis of genomic data two orders of magnitude faster than in previous versions of GATK, enabling faster iterative analysis for propelling genomic innovation.“


About Cloudera

Cloudera delivers the modern data management and analytics platform built on Apache Hadoop and the latest open source technologies. The world’s leading organizations trust Cloudera to help solve their most challenging business problems with Cloudera Enterprise, the fastest, easiest and most secure data platform available for the modern world. Our customers efficiently capture, store, process and analyze vast amounts of data, empowering them to use advanced analytics to drive business decisions quickly, flexibly and at lower cost than has been possible before. To ensure our customers are successful, we offer comprehensive support, training and professional services. Learn more at http://cloudera.com.


Connect with Cloudera

About Cloudera: cloudera.com/content/cloudera/en/about/company-profile.html

Read our blogs: blog.cloudera.com/ and vision.cloudera.com/

Follow us on Twitter: twitter.com/cloudera

Visit us on Facebook: facebook.com/cloudera

Join the Cloudera Community: community.cloudera.com


Cloudera, Cloudera's Platform for Big Data, Cloudera Enterprise Data Hub Edition, Cloudera Enterprise Flex Edition, Cloudera Enterprise Basic Edition, Cloudera Navigator Optimizer and CDH are trademarks or registered trademarks of Cloudera Inc. in the United States, and in jurisdictions throughout the world. All other company and product names may be trademarks of their respective owners.


###


            

Contact Data