BioKanga

Suite of high performance bioinformatics applications
Download

BioKanga Ranking & Summary

Advertisement

  • Rating:
  • Publisher Name:
  • biostuartjs
  • File Size:
  • 588 KB

BioKanga Tags


BioKanga Description

BioKanga is a collection of high performance bioinformatics applications targeting the challenges of next generation sequencing analytics. Kanga is an acronym standing for 'K-mer Adaptive Next Generation Aligner' and is the primary application. BioKanga is a highly efficient short-read aligner that incorporates an empirically derived understanding of sequence uniqueness within a target genome to enable robust alignment of next generation sequencer short read datasets in either colorspace (ABI SOLiD) or basespace (Illumina). Compared with other widely used aligners, BioKanga provides substantial gains in both the proportion and quality of aligned sequence reads at competitive or increased computational efficiency. Unlike most other aligners, BioKanga utilises Hamming distances between putative alignments to the targeted genome assembly for any given read as the discrimative acceptance criteria rather than relying on sequencer sourced quality scores. Kangadna is the primary component of an additional two part toolkit targeted towards the de Novo assembly of NGS short read datasets. The secondary component, kangahrdx ( K-mer Adaptive Next Gen Aligner Homozygosity Reduction) is primarily targeted towards RNA-seq transcriptomics whereby transcript isoforms result in many assembled contigs sharing regions of high identity most likely constituent exons. Toolset components: kangax Used to generate an extremely optimised suffix array lookup database containing the reference genome assembly against which the NG short read datasets are to be subsequently aligned. kangar This is the optional component, which can be used to pre-process the NG short read datasets into a format which is optimised for fast and efficient loading by the kanga aligner. kanga Kanga is the aligner component. It's major inputs are the reference genome assembly database as generated by kangax, and either a kangar preprocessed dataset or the raw fasta/fastq readset files. The primary output from kanga are the alignments in one of a number of user selected output formats. kangadna Kangadna is a multiphase greedy de Novo assembler with the overriding objective of delivering higher confidence quality contigs even if this quality is gained at the cost of reduced span and/or contig Nxx conventional measures. Kangadna natively assembles either base or colorspace (SOLiD) reads. Kangahrdx Kangahrdx will process contig sequences and identify regions which are inter-contig homozygotic. Identified regions will be removed from all but the longest contig resulting in contigs which are more hetrozygotic. This can assist in de Novo RNA-seq transcriptome analysis or in de Novo assembly of polyploidic genomes. Generally kangahrdx will be executed on the contigs generated by kangadna or some other assembler, the only significant restriction being that the input must be basespace and not colorspace.


BioKanga Related Software