Shyam's Slide Share Presentations

VIRTUAL LIBRARY "KNOWLEDGE - KORRIDOR"

This article/post is from a third party website. The views expressed are that of the author. We at Capacity Building & Development may not necessarily subscribe to it completely. The relevance & applicability of the content is limited to certain geographic zones.It is not universal.

TO VIEW MORE CONTENT ON THIS SUBJECT AND OTHER TOPICS, Please visit KNOWLEDGE-KORRIDOR our Virtual Library

Tuesday, September 30, 2014

IBM and CLC bio deliver genomics sequencing analytics solution 10-01

IBM and CLC bio deliver genomics sequencing analytics solution





IBM and CLC bio provide an accelerated genomics research platform to convert sequencer data to usable genomic insight.

Imagine a world where medical diagnoses and treatment regimens are based on a person’s specific genetic makeup—reducing side effects and improving patient outcomes. That’s the promise of personalized medicine, which is rapidly becoming a reality through advances in genomic sequencing and analysis.

APPLYING GENOMIC SEQUENCING TO THERAPEUTICS


Dr. Lukas Wartman has firsthand experience with the power of genomic sequencing. A genetics researcher at Washington University in St. Louis, Missouri, Dr. Wartman ended up contracting the very disease he was studying: adult acute lymphoblastic leukemia. His condition deteriorated rapidly, and there was no known treatment for the cancer.

His colleagues decided to fully sequence the genes of both his cancerous cells and healthy cells using the High Performance Computing cluster housed in the Genome Institute at Washington University. They discov-ered something completely unexpected: one of Dr. Wartman’s normal genes, FLT3, was malfunctioning, producing massive quantities of a protein that was feeding the cancer.

The team found a drug typically used to control the overactive FLT3 gene in patients with kidney cancer. Dr. Wartman became the first person to take this drug for leukemia, and his cancer is now in remission. Dr. Wartman’s case demonstrates how genomic sequencing enables researchers to understand the role of genes in fueling a specific cancer. Consequently, cancer treatment could be customized with drugs that tar-get a gene rather than the tumor or tissue where the cancer first appears.

IBM Systems and Technology

IBM Technical Computing



ESTABLISHING HIGH-THROUGHPUT PERFORMANCE 


Because each human genome comprises over three billion base pairs, whole genomic sequencing requires tremendous process-ing power and storage capacity in order to correlate the variants in the genome with the relevant patient symptoms. Facing increased demand for sequencing, the industry is challenged

to drive down cost while speeding up the assembly, mapping and analysis involved in the sequencing process.

To address these issues, IBM and CLC bio have undertaken a joint effort to develop the IBM Application Ready Solution for CLC bio, a next-generation sequencing (NGS) platform. The system was built for practitioners, requiring little IT administration, yet it is scalable, flexible and extendable. This end-to-end solution integrates a computing cluster built on advanced IBM hardware and software, CLC Genomics Server

software for high-throughput sequencing, and CLC Genomics Workbench client/desktop software for analyzing and visualiz-ing NGS data.

The cluster compute nodes consist of IBM® Flex System™ x240 powered by Intel® Xeon® E5-2680v2 processors. These nodes are connected to an IBM Storwize® V7000 Unified network attached storage system that consolidates block and file workloads. The Storwize V7000 Unified system also has a single, easy-to-use management interface that supports both block and file storage, helping to simplify administration.

Storwize V7000 Unified system supports file data storage using the IBM General Parallel File System (GPFS™). With its leading file system performance and its ability to scale based on customer needs, GPFS is used in the world’s largest high-performance computing (HPC) installations in addition to mainstream technical computing environments. Plus, CLC bio software uses a shared-disk file management solution that provides fast, reliable access to NGS data for optimizing performance.



Life Sciences





To simplify the deployment and management of the cluster, IBM Platform™ HPC provides a complete set of technical and high performance computing (HPC) management capabilities in a single product. The rich set of out-of-the-box features reduces the complexity and cost of managing and running an optimized genomics sequencing cluster. Integrated workload management features have been designed to help improve time-to-results and asset utilization.

PROVIDING A SCALABLE, TURNKEY SOLUTION


IBM Application Ready Solution for CLC bio has been developed in partnership with CLC bio to deliver a scalable, high performance genomics sequencing platform based on an IBM reference architecture. A turnkey solution is available from IBM business partner Re-Store, LLC. It comes pre-integrated with CLC Genomics Server and CLC Genomics Workbench and includes global support and service. The solution is easy

to deploy and use, simplifying IT administration and boosting productivity. It has also been designed to scale as workloads expand over time. The solution provides up to 90 TB of effective storage capacity, and administrators can easily add storage extensions and more compute nodes as necessary.

These three analytics solutions have been benchmarked for their mapping, variant calling and filtering performance.

CLC Genomics Workbench 6.5 and Platform HPC enabled Genomics Server 5.5 were installed on an IBM server under Storwize V7000 Unified and GPFS. The benchmark was executed using the 37x coverage human genome data set (1,415,483,596 reads, 100 bp/read) and 150x coverage Exome reads (NA12878) from Illumina Genome Analyzer II. Benchmarking showed that the change to Analytics Solutions will perform as follows (see Figures 1 on page 3).
















2

IBM Systems and Technology



Life Sciences

IBM Technical Computing





Turnkey solution options:













Small Analytics Solution
Medium Analytics Solution
Large Analytics Solution







Workload size per week

15 human genome (37x) or
30 human genome (37x) or
60 human genome (37x) or



120 human exome (150x)
240 human exome (150x)
480 human exome (150x)







Applications

CLC Genomics Server 5.5x,
CLC Genomics Server 5.5x,
CLC Genomics Server 5.5x,



CLC Genomics Workbench:
CLC Genomics Workbench:
CLC Genomics Workbench:



9 static licenses


12 static licenses
15 static licenses







Application maintenance

Three years of full maintenance
Three years of full maintenance
Three years of full maintenance



(support and all upgrades)
(support and all upgrades)
(support and all upgrades)



on CLC bio software

on CLC bio software
on CLC bio software








Management software

IBM® Platform™ HPC

IBM Platform HPC
IBM Platform HPC









System rack

One 25U rack


One 25U rack
One 42U rack







System switch

Top-of-rack network switch
Top-of-rack network switch
Top-of-rack network switch







System manage-

One IBM Flex System x240 with
One IBM Flex System x240 with
One IBM Flex System x240 with

ment node

16 CPU cores and 64 GB RAM
16 CPU cores and 64 GB RAM
16 CPU cores and 64 GB RAM







System compute nodes

Three IBM Flex System x240 with
Six IBM Flex System x240 with
Twelve IBM Flex System x240 with



60 CPU cores and 384 GB RAM
120 CPU cores and 768 GB RAM
240 CPU cores and 1536 GB RAM







CPU/compute node

2 Intel Xeon 10C Processor Model
2 Intel Xeon 10C Processor Model
2 Intel Xeon 10C Processor Model



E5-2680v2 115W


E5-2680v2 115W
E5-2680v2 115W



2.8GHz/1866MHz/25MB
2.8GHz/1866MHz/25MB
2.8GHz/1866MHz/25MB









Memory/compute node

128 GB DDR3


128 GB DDR3
128 GB DDR3







System internal storage

6 TB, 7,200 rpm NL SAS
6 TB, 7,200 rpm NL SAS
6 TB, 7,200 rpm NL SAS







Storwize 7000 Unified

20 TB effective storage capacity
55 TB effective storage capacity
90 TB effective storage capacity







System maintenance

3 Year Onsite Repair 24x7, 4 Hour
3 Year Onsite Repair 24x7, 4 Hour
3 Year Onsite Repair 24x7, 4 Hour



Response


Response
Response



















HH:MM:SS






0:00:00





21:36:00






19:12:00





16:48:00






14:24:00





12:00:00






9:36:00






7:12:00






4:48:00






2:24:00






0:00:00
37x Coverage WGS
150x Coverage WEX







filtering

0:19:32
0:14:31


variant calling

16:33:33
1:27:21


mapping

5:56:04
0:42:05




Figure 1. NGS Workflow benchmark performance of 37x coverage whole human genome reads and 150x coverage whole human exome reads on IBM single compute node. The workflow includes read mapping, variant calling to filter variants against known database (common SNAPs/INDELs database).


3



PROVIDING A FOUNDATION FOR FULL-GENOME ANALYSIS


In the future, a person’s entire genome sequence will become part of his or her electronic medical records. A full individual genome can be compared to a reference human genome, which previously could take weeks or months to assemble, map and analyze. But benchmarking shows that the exceptional performance of IBM Application Ready Solution for CLC bio integrated with CLC Genomics Server enables researchers to obtain this critical information in a matter of days, even hours. The solution provides a scalable, flexible, high-performance platform that helps accelerate genomic research and leads

to a deep understanding of the associations between genetic variations and diseases—and potential cures.