当前位置:   article > 正文

【生物信息】DIAMOND进行序列比对_diamond做序列比对

diamond做序列比对

DIAMOND是一款用于蛋白质和翻译DNA搜索的序列比对器,专为大序列数据的高性能分析而设计。

官方文档:Home · bbuchfink/diamond Wiki (github.com)

1 安装DIAMOND

  1. # 使用conda创建diamond环境并安装diamond
  2. conda create --name diamond diamond
  3. # 激活diamond
  4. conda activate diamond
  5. # 查看diamond版本
  6. diamond --version

2 蛋白质序列比对(Protein alignment)

  1. 下载示例数据,这个数据集为FASTA格式,其中包含了14,323条蛋白质序列

    wget https://scop.berkeley.edu/downloads/scopeseq-2.07/astral-scopedom-seqres-gd-sel-gs-bib-40-2.07.fa

  2. 现在利用diamond makedb将刚下载的文件转换成DIAMOND数据库文件,这个数据库文件将用于后续的比对。

    diamond makedb --in astral-scopedom-seqres-gd-sel-gs-bib-40-2.07.fa -d astral40

  3. 用同一文件进行序列查找

    diamond blastp -q astral-scopedom-seqres-gd-sel-gs-bib-40-2.07.fa -d astral40 -o out.tsv --very-sensitive

    参数解释:

    -q 后接需要查询的文件

    -d 后接上一步生成的数据库文件

    -o 后接搜寻结果

    DIAMOND具有多种灵敏度设置,以适应不同的应用。默认模式是最快的,专为查找 >70% 序列同一性的同源性而定制,--sensitive 模式针对 >40% 同一性的命中量身定制,而 --very-sensitive 和 --ultra-sensitive 模式在整个成对比对范围内提供较高的灵敏度。灵敏度越高,越可能匹配到阳性结果。

  4. 结果解释

    部分结果:

    d1dlwa_ d1dlwa_ 100     116     0       0       1       116     1       116     6.42e-77        220
    d1dlwa_ d2gkma_ 35.4    113     73      0       1       113     13      125     1.43e-21        80.9
    d1dlwa_ d4i0va_ 31.9    119     75      2       1       113     2       120     9.11e-13        58.2
    d2gkma_ d2gkma_ 100     127     0       0       1       127     1       127     1.51e-87        248
    d2gkma_ d1dlwa_ 34.8    115     75      0       13      127     1       115     6.90e-23        84.3
    d2gkma_ d4i0va_ 33.6    110     69      1       13      118     2       111     1.35e-18        73.6
    d2gkma_ d6bmea_ 35.5    110     67      1       13      118     2       111     1.32e-16        68.6
    d2gkma_ d2bkma_ 37.3    67      38      2       13      76      5       70      5.18e-06        40.8
    d1ngka_ d1ngka_ 100     126     0       0       1       126     1       126     4.34e-91        257
    d1ngka_ d2bkma_ 38.4    125     73      2       1       125     4       124     1.42e-24        89.0

    各列含义解释:

    1. Query accession: the accession of the sequence that was the search query against the database, as specified in the input FASTA file after the > character until the first blank.

    2. Target accession: the accession of the target database sequence (also called subject) that the query was aligned against.

    3. Sequence identity: The percentage of identical amino acid residues that were aligned against each other in the local alignment.

    4. Length: The total length of the local alignment, which including matching and mismatching positions of query and subject, as well as gap positions in the query and subject.

    5. Mismatches: The number of non-identical amino acid residues aligned against each other.

    6. Gap openings: The number of gap openings.

    7. Query start: The starting coordinate of the local alignment in the query (1-based).

    8. Query end: The ending coordinate of the local alignment in the query (1-based).

    9. Target start: The starting coordinate of the local alignment in the target (1-based).

    10. Target end: The ending coordinate of the local alignment in the target (1-based).

    11. E-value: The expected value of the hit quantifies the number of alignments of similar or better quality that you expect to find searching this query against a database of random sequences the same size as the actual target database. This number is most useful for measuring the significance of a hit. By default, DIAMOND will report all alignments with e-value < 0.001, meaning that a hit of this quality will be found by chance on average once per 1,000 queries.

    12. Bit score: The bit score is a scoring matrix independent measure of the (local) similarity of the two aligned sequences, with higher numbers meaning more similar. It is always >= 0 for local Smith Waterman alignments.

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/AllinToyou/article/detail/453752
推荐阅读
相关标签
  

闽ICP备14008679号