Distributed Indexing Dispatched Alignment* (DIDA*)

DIDA* performs large-scale alignment tasks by distributing the indexing and alignment stages into smaller subtasks over a cluster of compute nodes.

Performance increase by up to 77 percent1

Distributed Indexing Dispatched Alignment* (DIDA*) is a novel distributed and parallel indexing and alignment framework that consists of five major steps to perform the indexing and alignment task: distribute, index, dispatch, align, and merge. The indexing and dispatch steps are performed in parallel. It works by first partitioning the targets into smaller parts using a heuristic balanced cut. Next, DIDA creates an index for each partition. The reads are then “flowed” through a Bloom filter to dispatch the alignment task to the node(s). Finally, the reads are aligned on all partitions in parallel and the partial results are combined together to create the final output.

DIDA is written in C++ and parallelized using OpenMP for multithreaded computing on a single computing node. For distributed computing, DIDA employs a message passing interface (MPI) for inter-process communications. As input, it gets the set of target sequences and the set of queries in FASTA or FASTQ formats, and the default output is SAM format.

Performance Results

The performance of DIDA was measured and evaluated when coupled with popular alignment methods Burrows-Wheeler Aligner* (BWA*), Bowtie2, Novoalign, and ABySS-map on C. elegans, human draft genome, human reference genome, and P. glauca genome. Compared to their baseline performance, when run through the DIDA framework with 12 nodes, BWA, Bowtie2, Novoalign, and ABySS-map use less memory (91 percent, 90 percent, 87 percent, and 91 percent, respectively) and execute faster (55 percent, 74 percent, 77 percent, and 67 percent, respectively) for a draft human genome assembly.1

Download the code ›

Reproduce these results with this optimization recipe ›

Related Codes

Assembly By Short Sequences* (ABySS*) ›

Publications

Hamid Mohamadi, Benjamin P. Vandervalk, Anthony Raymond, Shaun D. Jackman, Justin Chu, Clay P. Breshears, and Inanc Birol. "DIDA: Distributed Indexing Dispatched Alignment." PLoS ONE 10, no. 4 (2015). doi: 10.1371/journal.pone.0126409.

Configuration Table

System Overview

 

Nodes

Twelve HPC nodes interconnected by 40Gbps Infiniband

Processor

Each node has two Intel® Xeon® X5650 processors (2.67 GHz)

RAM

Each node has 48GB RAM

Operating System

CentOS 5.4
Intel® Cluster Studio 2013
DIDA ver. 1.0.1, ABySS-map v1.5.2
BWA v0.7.10, Bowtie2 v2.1.0
Novoalign v3.01.02

Informazioni su prodotti e prestazioni

1

I risultati dei benchmark sono stati ottenuti prima dell'implementazione delle patch software e degli aggiornamenti firmware più recenti volti a risolvere gli exploit denominati "Spectre" e "Meltdown". Con l'implementazione di tali aggiornamenti i risultati potrebbero non essere applicabili al dispositivo o al sistema in uso.

Il software e i carichi di lavoro utilizzati nei test delle prestazioni possono essere stati ottimizzati per le prestazioni solo su microprocessori Intel®. I test delle prestazioni, come SYSmark* e MobileMark*, sono calcolati utilizzando specifici sistemi computer, componenti, software, operazioni e funzioni. Qualunque variazione in uno di questi fattori può comportare risultati diversi. Consultare altre fonti di informazione e altri test delle prestazioni per una valutazione completa dei prodotti che si desidera acquistare, comprese le prestazioni di tali prodotti se abbinati ad altri prodotti. Per informazioni più complete, visitare il sito Web all'indirizzo http://www.intel.it/benchmarks.