Extending into the Cloud: Two Use Cases from Bio-IT World


By Ian Lumb | April 17, 2013 | Cloud Computing, Cluster Management, HPC Cluster, Linux Cluster, Linux Cluster Management, Amazon EC2, Bright Cluster Manager, Life Sciences



The last time I participated in Bio-IT World it was May 2005. While Web inventor Sir Tim Berners-Lee keynoted the event opening by sharing his vision for a next-generation Web, those of us in the trenches (aka. the exhibits area) attempted to communicate the value proposition for grid computing - or was it Grid computing, or even Grid Computing ...

About a week after officially joining Bright Computing, I once again had the opportunity to participate in Bio-IT World. With more than double the number of participants than the 2005 event, the 2013 event continued the tradition of resonating with the biotech scene in the Greater Boston area.

This time, however, I noticed a distinct inversion in many of the conversations I had with those who dropped by the Bright booth: In many cases I found myself responding to expressions of interest in, as well as initial successes with, use of cloud-based infrastructures. In other words, Bright prospects, customers and partners were telling me their stories, rather than me framing possibilities for them. How refreshing!

From these conversations various data points became evident. For example, from the conversations with representatives of large pharmaceutical companies, there emerged the use case for complementing on-site resources with those available via the cloud. In this scenario, workloads that exceed site capacity are diverted to the cloud - please see the illustration below.

Organization size vs. resource locality

Resource locality versus organization size in two scenarios. (Please see the end note regarding relative proportions.)

At the opposite end of the organizational-size spectrum are biotech start-ups. Being start-ups, minimal IT resources exist on-site. By extending their local IT infrastructure into the cloud, start-ups can validate, demonstrate and ultimately establish a viable business in a cost-effective fashion - please see the illustration above.

Whether big pharma or biotech start-up, these use cases are motivated by a common need for accelerated time-to-results. Well-motivated and self-validated use cases allowed delegates and Bright staff to engage in detailed technical discussions. As a logical consequence, the ability to provision, monitor and manage an Amazon EC2 instance via Bright Cluster Manager (please see the illustration below) was received with considerable interest.

Organization size vs. resource locality with Bright Cluster Manager

Resource locality versus organization size in two scenarios. In both scenarios, on-site IT infrastructure is extended into an Amazon EC2 cloud instance via the Bright Cluster Manager. (Please see the end note regarding relative proportions.)  

As anyone with a background in HPC knows, this offloading approach is not a solution applicable to all classes of computational workloads. Briefly, this approach works extremely effectively with embarrassingly parallel workloads - a latency-tolerant problem deconstruction based on exploiting parallelism that naturally exist in the data. Fortunately, the computational needs of problems in the life sciences can often be cast as embarrassingly parallel workloads.

Although the extension of on-site IT infrastructure into the cloud works quite well for workloads typical to the life sciences, this discipline does present a number of additional requirements that need to be addressed:

  • Locality-based scheduling - Business logic determines where (i.e., on-site versus in-the-cloud) workloads should be executed. Workload managers tightly integrated with the Bright Cluster Manager ensure that this is indeed the case at run time.
  • Data-aware scheduling - Bright Cluster Manager ensures that Amazon EC2 resources aren’t even provisioned until a data transfer nears completion.
  • Persistence in the cloud - Optionally, a Bright Cluster Manager Cloud Director can be established as a persistent resource in Amazon EC2 to allow for in-the-cloud storage as well as the ability to rapidly provision resources.
  • Monitoring and managing the entire IT infrastructure - Because an Amazon EC2 instance is merely an extension of the IT resources available locally, Bright Cluster Manager allows on-site and cloud-based IT resources to be monitored and managed on an operational basis.

The ability to provision, monitor and manage Amazon EC2 instances via the Bright Cluster Manager has been proven with customers in the life-sciences for over a year now.

Finally, and to present as balanced a perspective as possible, it’s not completely fair weather for those interested in extending their on-site IT infrastructure into the cloud. One significant, outstanding challenge is that of satisfying the regulations that govern the pursuit of business in the life-sciences arena. Although technology can serve as an enabler here, and again based on conversations with delegates at Bio-IT World, the primary issues are not technical in nature. Not unlike concerns relating to privacy and the cloud, the regulatory needs of this market need to be addressed so the life sciences can fully embrace the cloud.

Through the lens of Bio-IT World, the life sciences presents in 2013 as dynamic and engaging as it did about 8 years ago. To be continued ...

End Notes:

  1. In both illustrations, the large, blue cube in the lower-right quadrant is intended to convey the significant on-site IT infrastructures present at large pharmaceutical companies. When big pharma extends their IT infrastructure into the cloud (i.e., upper-right quadrant), they acquire a modest complement of resources. As a consequence, for the big-pharma use case, the cloud complement of IT resources is much smaller than what exists on site. The relative proportions of the large, blue cube and small big pharma cloud are intended to convey these proportions. The opposite is true for biotech start-ups. In this case, minimal on site IT infrastructure (small, red cube in the lower-left quadrant) is significantly amplified by incorporation of resources acquired from the cloud (upper-left quadrant). The relative proportions of the small, red cube and large cloud for biotech start-ups are intended to convey these proportions. Whereas comparisons between the cubes (the lower half of the illustrations) does accurately convey the relative proportions of on-site IT resources, the same cannot be stated for IT resources allocated in the cloud. In other words, even though the illustration does not accurately convey the proportions visually, biotech start-ups and big pharma may have acquired similarly sized allocations of IT resources from the cloud. This reflects a limitation of the illustration only.