Scope and Speed: 17,000 Core HPC Clusters Used for DNA Sequencing


By Lionel Gibbons | April 18, 2015 | HPC cluster management, HPC, dna



dna-strandThe machines that are processing the data collected for modern DNA sequencing are required to be many times faster than those used for the Human Genome Project, according to a recent Scientific Computing post from the Wellcome Trust Sanger Institute. The Institute currently is producing more sequences in one hour than it did in its first ten years of operations. This is necessary, Tim Cutts, the head of scientific computing at the Institute states, as the DNA sequencing data of a single cancer genome sample amounts to 7,000 CPU hours’ worth of analysis and tens of thousands of these are being run at once.

The work requires powerful computing to the tune of 17,000 core HPC clusters, as well as a storage engine that allows the Institute to continue to expand the amount of data it processes while maintaining compute speed.

Cutts says the Institute constantly is facing the need for 24/7 access and the ability to keep up with unpredictable data growth. In one upgrade, he wrote, they went from 100 sequencers to the equivalent of 700 machines. They've also upgraded the 10 GbE network to 40 GbE to keep up with the increasing demand for bandwidth.

While your data may not require the number of HPC clusters, storage, or the other needs that the Institute deals with on a daily basis, your work is just as important. That's why Bright Computing is making it easier for you to deploy clusters and manage them, regardless of your industry or what type of cluster-based technology you use.

Bright Computing software, running in more than 500 data centers around the globe, provides users with a single, unified solution for the provisioning, scheduling, monitoring, and management of HPC clusters, Hadoop clusters, and OpenStack clouds.

High Performance Computing eBook