赞
踩
二代测序数据分析软件包大全 Integrated solutions*CLCbio
Genomics Workbench-de
novoand
reference assembly of Sanger, Roche FLX, Illumina, Helicos, and
SOLiD data. Commercial next-gen-seq software that extends the
CLCbio Main Workbench software. Includes SNP detection, CHiP-seq,
browser and other features. Commercial. Windows, Mac OS X and
Linux.
*Galaxy-
Galaxy = interactive and reproducible genomics. A job
webportal.
*Genomatix-
Integrated Solutions for Next Generation Sequencing data
analysis.
*JMP
Genomics-
Next gen visualization and statistics tool from SAS. They
areworking with NCGRto
refine this tool and produce others.
*NextGENe-de
novoand
reference assembly of Illumina, SOLiD and Roche FLX data. Uses a
novel Condensation Assembly Tool approach where reads are joined
via "anchors" into mini-contigs before assembly. Includes SNP
detection, CHiP-seq, browser and other features. Commercial. Win or
MacOS.
*SeqMan
Genome Analyser-
Software for Next Generation sequence assembly of Illumina, Roche
FLX and Sanger data integrating with Lasergene Sequence Analysis
software for additional analysis and visualization capabilities.
Can use a hybrid templated/de novo approach. Commercial. Win or Mac
OS X.
*SHORE-
SHORE, for Short Read, is a mapping and analysis pipeline for short
DNA sequences produced on a Illumina Genome Analyzer. A suite
created by the 1001 Genomes project. Source for
POSIX.
*SlimSearch-
Fledgling commercial product.
Align/Assemble to a reference
*BFAST-
Blat-like Fast Accurate Search Tool. Written by Nils Homer, Stanley
F. Nelson and Barry Merriman at UCLA.
*Bowtie-
Ultrafast, memory-efficient short read aligner. It aligns short DNA
sequences (reads) to the human genome at a rate of 25 million reads
per hour on a typical workstation with 2 gigabytes of memory. Uses
a Burrows-Wheeler-Transformed (BWT) index.Link
to discussion thread here.
Written by Ben Langmead and Cole Trapnell. Linux, Windows, and Mac
OS X.
*BWA-
Heng Lee's BWT Alignment program - a progression from Maq. BWA is a
fast light-weighted tool that aligns short sequences to a sequence
database, such as the human reference genome. By default, BWA finds
an alignment within edit distance 2 to the query sequence. C++
source.
*ELAND-
Efficient Large-Scale Alignment of Nucleotide Databases. Whole
genome alignments to a reference genome. Written by Illumina author
Anthony J. Cox for the Solexa 1G machine.
*Exonerate-
Various forms of pairwise alignment (including
Smith-Waterman-Gotoh) of DNA/protein against a reference. Authors
are Guy St C Slater and Ewan Birney from EMBL. C for
POSIX.
*GenomeMapper-
GenomeMapper is a short read mapping tool designed for accurate
read alignments. It quickly aligns millions of reads either with
ungapped or gapped alignments. A tool created by the 1001 Genomes
project. Source for POSIX.
*GMAP-
GMAP (Genomic Mapping and Alignment Program) for mRNA and EST
Sequences. Developed by Thomas Wu and Colin Watanabe at Genentec.
C/Perl for Unix.
*gnumap-
The Genomic Next-generation Universal MAPper (gnumap) is a program
designed to accurately map sequence data obtained from
next-generation sequencing machines (specifically that of
Solexa/Illumina) back to a genome of any size. It seeks to align
reads from nonunique repeats using statistics. From authors at
Brigham Young University. C source/Unix.
*MAQ-
Mapping and Assembly with Qualities (renamed from MAPASS2).
Particularly designed for Illumina with preliminary functions to
handle ABI SOLiD data. Written by Heng Li from the Sanger Centre.
Features extensive supporting tools for DIP/SNP detection, etc. C++
source
*MOSAIK-
MOSAIK produces gapped alignments using the Smith-Waterman
algorithm. Features a number of support tools. Support for Roche
FLX, Illumina, SOLiD, and Helicos. Written by Michael Str?mberg at
Boston College. Win/Linux/MacOSX
*MrFAST and
MrsFAST-
mrFAST & mrsFAST are designed to map short reads generated with
the Illumina platform to reference genome assemblies; in a fast and
memory-efficient manner. Robust to INDELs and MrsFAST has a
bisulphite mode. Authors are from the University of Washington. C
as source.
*MUMmer-
MUMmer is a modular system for the rapid whole genome alignment of
finished or draft sequence. Released as a package providing an
efficient suffix tree library, seed-and-extend alignment, SNP
detection, repeat detection, and visualization tools. Version 3.0
was developed by Stefan Kurtz, Adam Phillippy, Arthur L Delcher,
Michael Smoot, Martin Shumway, Corina Antonescu and Steven L
Salzberg - most of whom are at The Institute for Genomic Research
in Maryland, USA. POSIX OS required.
*Novocraft-
Tools for reference alignment of paired-end and single-end Illumina
reads. Uses a Needleman-Wunsch algorithm. Can support Bis-Seq.
Commercial. Available free for evaluation, educational use and for
use on open not-for-profit projects. Requires Linux or Mac OS
X.
*PASS-
It supports Illumina, SOLiD and Roche-FLX data formats and allows
the user to modulate very finely the sensitivity of the alignments.
Spaced seed intial filter, then NW dynamic algorithm to a SW(like)
local alignment. Authors are from CRIBI in Italy.
Win/Linux.
*RMAP-
Assembles 20 - 64 bp Illumina reads to a FASTA reference genome. By
Andrew D. Smith and Zhenyu Xuan at CSHL. (published in BMC
Bioinformatics). POSIX OS required.
*SeqMap-
Supports up to 5 or more bp mismatches/INDELs. Highly tunable.
Written by Hui Jiang from the Wong lab at Stanford. Builds
available for most OS's.
*SHRiMP-
Assembles to a reference sequence. Developed with Applied
Biosystem's colourspace genomic representation in mind. Authors are
Michael Brudno and Stephen Rumble at the University of Toronto.
POSIX.
*Slider-
An application for the Illumina Sequence Analyzer output that uses
the probability files instead of the sequence files as an input for
alignment to a reference sequence or a set of reference sequences.
Authors are from BCGSC. Paper ishere.
*SOAP-
SOAP (Short Oligonucleotide Alignment Program). A program for
efficient gapped and ungapped alignment of short oligonucleotides
onto reference sequences. The updated version uses a BWT. Can call
SNPs and INDELs. Author is Ruiqiang Li at the Beijing Genomics
Institute. C++, POSIX.
*SSAHA-
SSAHA (Sequence Search and Alignment by Hashing Algorithm) is a
tool for rapidly finding near exact matches in DNA or protein
databases using a hash table. Developed at the Sanger Centre by
Zemin Ning, Anthony Cox and James Mullikin. C++ for
Linux/Alpha.
*SOCS-
Aligns SOLiD data. SOCS is built on an iterative variation of the
Rabin-Karp string search algorithm, which uses hashing to reduce
the set of possible matches, drastically increasing search speed.
Authors are Ondov B, Varadarajan A, Passalacqua KD and Bergman
NH.
*SWIFT-
The SWIFT suit is a software collection for fast index-based
sequence comparison. It contains: SWIFT — fast local alignment
search, guaranteeing to find epsilon-matches between two sequences.
SWIFT BALSAM — a very fast program to find semiglobal non-gapped
alignments based on k-mer seeds. Authors are Kim Rasmussen (SWIFT)
and Wolfgang Gerlach (SWIFT BALSAM)
*SXOligoSearch-
SXOligoSearch is a commercial platform offered by the Malaysian
basedSynamatix.
Will align Illumina reads against a range of Refseq RNA or NCBI
genome builds for a number of organisms. Web Portal. OS
independent.
*Vmatch-
A versatile software tool for efficiently solving large scale
sequence matching tasks. Vmatch subsumes the software tool REPuter,
but is much more general, with a very flexible user interface, and
improved space and time requirements. Essentially a large string
matching toolbox. POSIX.
*Zoom-
ZOOM (Zillions Of Oligos Mapped) is designed to map millions of
short reads, emerged by next-generation sequencing technology, back
to the reference genomes, and carry out post-analysis. ZOOM is
developed to be highly accurate, flexible, and user-friendly with
speed being a critical priority. Commercial. Supports Illumina and
SOLiD data.
De
novoAlign/Assemble
*ABySS-
Assembly By Short Sequences. ABySS is a de novo sequence assembler
that is designed for very short reads. The single-processor version
is useful for assembling genomes up to 40-50 Mbases in size. The
parallel version is implemented using MPI and is capable of
assembling larger genomes. By Simpson JT and others at the Canada's
Michael Smith Genome Sciences Centre. C++ as source.
*ALLPATHS-
ALLPATHS: De novo assembly of whole-genome shotgun microreads.
ALLPATHS is a whole genome shotgun assembler that can generate high
quality assemblies from short reads. Assemblies are presented in a
graph form that retains ambiguities, such as those arising from
polymorphism, thereby providing information that has been absent
from previous genome assemblies. Broad
Institute.
*Edena-
Edena (Exact DE Novo Assembler) is an assembler dedicated to
process the millions of very short reads produced by the Illumina
Genome Analyzer. Edena is based on the traditional overlap layout
paradigm. By D. Hernandez, P. Fran?ois, L. Farinelli, M. Osteras,
and J. Schrenzel. Linux/Win.
*EULER-SR-
Short readde
novoassembly.
By Mark J. Chaisson and Pavel A. Pevzner from UCSD (published in
Genome Research). Uses a de Bruijn graph
approach.
*MIRA2-
MIRA (Mimicking Intelligent Read Assembly) is able to perform true
hybrid de-novo assemblies using reads gathered through 454
sequencing technology (GS20 or GS FLX). Compatible with 454, Solexa
and Sanger data. Linux OS required.
*SEQAN-
A Consistency-based Consensus Algorithm for De Novo and
Reference-guided Sequence Assembly of Short Reads. By Tobias Rausch
and others. C++, Linux/Win.
*SHARCGS-
De novo assembly of short reads. Authors are Dohm JC, Lottaz C,
Borodina T and Himmelbauer H. from the Max-Planck-Institute for
Molecular Genetics.
*SSAKE-
The Short Sequence Assembly by K-mer search and 3' read Extension
(SSAKE) is a genomics application for aggressively assembling
millions of short nucleotide sequences by progressively searching
for perfect 3'-most k-mers using a DNA prefix tree. Authors are
René Warren, Granger Sutton, Steven Jones and Robert Holt from the
Canada's Michael Smith Genome Sciences Centre.
Perl/Linux.
*SOAPdenovo-
Part of the SOAP suite. See above.
*VCAKE-
De novo assembly of short reads with robust error correction. An
improvement on early versions of SSAKE.
*Velvet-
Velvet is a de novo genomic assembler specially designed for short
read sequencing technologies, such as Solexa or 454. Need about
20-25X coverage and paired reads. Developed by Daniel Zerbino and
Ewan Birney at the European Bioinformatics Institute
(EMBL-EBI).
SNP/Indel Discovery
*ssahaSNP-
ssahaSNP is a polymorphism detection tool. It detects homozygous
SNPs and indels by aligning shotgun reads to the finished genome
sequence. Highly repetitive elements are filtered out by ignoring
those kmer words with high occurrence numbers. More tuned for ABI
Sanger reads. Developers are Adam Spargo and Zemin Ning from the
Sanger Centre. Compaq Alpha, Linux-64, Linux-32, Solaris and
Mac
*PolyBayesShort-
A re-incarnation of the PolyBayes SNP discovery tool developed by
Gabor Marth at Washington University. This version is specifically
optimized for the analysis of large numbers (millions) of
high-throughput next-generation sequencer reads, aligned to whole
chromosomes of model organism or mammalian genomes. Developers at
Boston College. Linux-64 and Linux-32.
*PyroBayes-
PyroBayes is a novel base caller for pyrosequences from the 454
Life Sciences sequencing machines. It was designed to assign more
accurate base quality estimates to the 454 pyrosequences.
Developers at Boston College.
Genome Annotation/Genome Browser/Alignment Viewer/Assembly
Database
*EagleView-
An information-rich genome assembler viewer. EagleView can display
a dozen different types of information including base quality and
flowgram signal. Developers at Boston
College.
*LookSeq-
LookSeq is a web-based application for alignment visualization,
browsing and analysis of genome sequence data. LookSeq supports
multiple sequencing technologies, alignment sources, and viewing
modes; low or high-depth read pileups; and easy visualization of
putative single nucleotide and structural variation. From the
Sanger Centre.
*MapView-
MapView: visualization of short reads alignment on desktop
computer. From the Evolutionary Genomics Lab at Sun-Yat Sen
University, China. Linux.
*SAM-
Sequence Assembly Manager. Whole Genome Assembly (WGA) Management
and Visualization Tool. It provides a generic platform for
manipulating, analyzing and viewing WGA data, regardless of input
type. Developers are Rene Warren, Yaron Butterfield, Asim Siddiqui
and Steven Jones at Canada's Michael Smith Genome Sciences Centre.
MySQL backend and Perl-CGI web-based frontend/Linux.
*STADEN-
Includes GAP4. GAP5 once completed will handle next-gen sequencing
data. A partially implemented test version is availablehere
*XMatchView-
A visual tool for analyzing cross_match alignments. Developed by
Rene Warren and Steven Jones at Canada's Michael Smith Genome
Sciences Centre. Python/Win or Linux.
Counting e.g. CHiP-Seq, Bis-Seq, CNV-Seq
*BS-Seq-
The source code and data for the "Shotgun Bisulphite Sequencing of
the Arabidopsis Genome Reveals DNA Methylation Patterning" Nature
paper byCokus et al.(Steve
Jacobsen's lab at UCLA). POSIX.
*CHiPSeq-
Program used by Johnson et al. (2007) in their Science
publication
*CNV-Seq-
CNV-seq, a new method to detect copy number variation using
high-throughput sequencing. Chao Xie and Martti T Tammi at the
National University of Singapore. Perl/R.
*FindPeaks-
perform analysis of ChIP-Seq experiments. It uses a naive algorithm
for identifying regions of high coverage, which represent Chromatin
Immunoprecipitation enrichment of sequence fragments, indicating
the location of a bound protein of interest. Original algorithm by
Matthew Bainbridge, in collaboration with Gordon Robertson. Current
code and implementation by Anthony Fejes. Authors are from the
Canada's Michael Smith Genome Sciences Centre. JAVA/OS independent.
Latest versions available as part of theVancouver Short Read Analysis
Package
*MACS-
Model-based Analysis for ChIP-Seq. MACS empirically models the
length of the sequenced ChIP fragments, which tends to be shorter
than sonication or library construction size estimates, and uses it
to improve the spatial resolution of predicted binding sites. MACS
also uses a dynamic Poisson distribution to effectively capture
local biases in the genome sequence, allowing for more sensitive
and robust prediction. Written by Yong Zhang and Tao Liu from
Xiaole Shirley Liu's Lab.
*PeakSeq-
PeakSeq: Systematic Scoring of ChIP-Seq Experiments Relative to
Controls. a two-pass approach for scoring ChIP-Seq data relative to
controls. The first pass identifies putative binding sites and
compensates for variation in the mappability of sequences across
the genome. The second pass filters out sites that are not
significantly enriched compared to the normalized input DNA and
computes a precise enrichment and significance. By Rozowsky J et
al. C/Perl.
*QuEST-
Quantitative Enrichment of Sequence Tags. Sidow and Myers Labs at
Stanford. From the 2008 publicationGenome-wide analysis of transcription factor binding
sites based on ChIP-Seq data.
(C++)
*SISSRs-
Site Identification from Short Sequence Reads. BED file input. Raja
Jothi @ NIH. Perl.
**See alsothis
threadfor
ChIP-Seq, until I get time to update this
list.
Alternate Base Calling
*Rolexa-
R-based framework for base calling of Solexa data.
Projectpublication
*Alta-cyclic-
"a novel Illumina Genome-Analyzer (Solexa) base
caller"
Transcriptomics
*ERANGE-
Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq.
Supports Bowtie, BLAT and ELAND. From the Wold
lab.
*G-Mo.R-Se-
G-Mo.R-Se is a method aimed at using RNA-Seq short reads to build
de novo gene models. First, candidate exons are built directly from
the positions of the reads mapped on the genome (without any ab
initio assembly of the reads), and all the possible splice
junctions between those exons are tested against unmapped reads.
From CNS in France.
*MapNext-
MapNext: A software tool for spliced and unspliced alignments and
SNP detection of short sequence reads. From the Evolutionary
Genomics Lab at Sun-Yat Sen University,
China.
*QPalma-
Optimal Spliced Alignments of Short Sequence Reads. Authors are
Fabio De Bona, Stephan Ossowski, Korbinian Schneeberger, and Gunnar
R?tsch. A paper isavailable.
*RSAT-
RSAT: RNA-Seq Analysis Tools. RNASAT is developed and maintained by
Hui Jiang at Stanford University.
*TopHat-
TopHat is a fast splice junction mapper for RNA-Seq reads. It
aligns RNA-Seq reads to mammalian-sized genomes using the ultra
high-throughput short read aligner Bowtie, and then analyzes the
mapping results to identify splice junctions between exons. TopHat
is a collaborative effort between the University of Maryland and
the University of California, Berkeley
转载自:http://blog.163.com/luyiming_1986@126/blog/static/151141532201122494757719/
二代测序数据预处理与分析
常使用的工具列表
质量控制Quality Control:FastQC、Fastx-toolkit
拼接Aligner:BWA,Bowtie, Tophat, SOAP2
Mapper:Tophat, Cufflinks
基因定量 Gene Quantification: Cufflinks, Avadis NGS
质量改进 Quality improvement: Genome Analysis Toolkit(GATK)
SNP: Unified Genotyper,Glfmultiple, SAMtools, Avadis NGS
CNV: CNVnator
Indel: Pindel, Dindel, Unified Genotyper, Avadis NGS
Mapping to a gene: Cufflinks, Rsamtools, Genomic Features
相关的数据格式
FASTQ:
SAM: A generic nucleotide alignment format
BAM: binary format
VCF
数据处理的流程
转载自:http://www.dxy.cn/bbs/thread/23163706#23163706
http://boyun.sh.cn/bio/?p=1862
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。