

To simplify the design and execution of complex bioinformatics workflows, especially those that use multiple software tools and data resources, a number of scientific workflow systems have been developed over the past decade. This workflow also involves different tasks and software tools, such as those used for assessing the quality of the reads, assembling them into longer DNA segments, querying them against different databases, and conducting phylogenetic and taxonomical analyses. Such workflow can start with a large collection of sequenced reads and end up with determination of the existing micro-organisms in the environmental sample and an estimation of their relative abundance. As another example, consider a workflow in the area of metagenomics. This workflow involves the use of multiple software tools to assess the quality of the reads, to map them to a reference human genome, to identify the sequence variations, to query databases for the sake of associating variations to diseases, and to check for novel variants. As one example, a personalized medicine workflow based on Next Generation Sequencing (NGS) technology can start with short DNA sequences (reads) of an individual human genome and end with a diagnostic and prognostic report, or potentially even with a treatment plan if clinical data were available. The workflows involve the use of multiple software tools and data resources in a staged fashion, with the output of one tool being passed as input to the next.

Such workflows start with voluminous raw sequences and end with detailed structural, functional, and evolutionary results. The advent of high-throughput sequencing technologies - accompanied with the recent advances in open source software tools, open access data sources, and cloud computing platforms - has enabled the genomics community to develop and use sophisticated application workflows. Increasing complexity of analysis and scientific workflow paradigm
