赞
踩
欢迎关注我的CSDN:https://spike.blog.csdn.net/
本文地址:https://blog.csdn.net/caroline_wendy/article/details/130519945
DIT:https://zhanggroup.org/D-I-TASSER/
D-I-TASSER (Deep-learning based Iterative Threading ASSEmbly Refinement) is a new method extended from I-TASSER for high-accuracy protein structure and function predictions.
D-I-TASSER (Deep-learning based Iterative Threading ASSEmbly Refinement,即基于深度学习的迭代、穿线、组装、精调) ,从 I-TASSER 扩展的新方法,用于高精度的蛋白结构和功能预测。
Starting from a query sequence, D-I-TASSER first generates inter-residue contact and distance maps and hydrogen-bond (HB) networks using multiple deep neural-network predictors, including AttentionPotential (self-attention network built on MSA transformer) and DeepPotential.
从一个查询序列开始,D-I-TASSER 首先使用多个深度神经网络预测器,生成,残基间接触 (inter-residue contact) 、距离图 (distance maps) 以及 氢键 (HB) 网络,包括 AttentionPotential (基于 MSA Transformer 的自注意力网络) 和 DeepPotential 。
Meanwhile, it identifies structural templates by the meta-threading LOMETS3 approach, which includes the models built from the state-of-the-art AlphaFold2 program.
同时,通过 元穿线 (meta-threading) LOMETS3 方法,识别结构模板,其中包括,由最先进的AlphaFold2程序构建的模型。
The full-length atomic models are finally assembled by iterative fragment assembly Monte Carlo simultions under the guidance of I-TASSER force field and deep-learning contact/distance/HB restraints, where biological functions of the query protein are derived from the structure models by COFACTOR.
最终,全长原子模型,在 I-TASSER 力场和深度学习 接触(contact) / 距离(distance) / HB 约束的指导下,通过迭代的 片段组装蒙特卡罗模拟 (fragment assembly Monte Carlo simultions) 进行组装,其中查询蛋白的生物学功能,由 COFACTOR 从结构模型中推导出来。
The D-I-TASSER pipeline (as ‘UM-TBM’ and ‘Zheng’) was ranked as the No. 1 server/predictor in all categories of protein structure prediction in the most recent CASP15 experiment, including Multi-domain Targets, Single-domain Targets, and Multi-chain Targets.
D-I-TASSER 流程 (作为“UM-TBM”和“Zheng”),在最近的 CASP15 实验中,被评为所有类别蛋白质结构预测的第一名,在服务器组和预测器组中,包括多域目标、单域目标和多链目标。
DeepMSA2:https://zhanggroup.org/DeepMSA/
DeepMSA (version 2) is a hierarchical approach to create high-quality multiple sequence alignments (MSAs) for monomer and multimer proteins.
DeepMSA (version 2) 是层次方法,用于创建单体和多体蛋白质的高质量多序列比对 (MSAs)。
The method is built on iterative sequence database searching followed by fold-based MSA ranking and selection.
该方法基于迭代的序列数据库搜索,然后,根据折叠为基础的 MSA 排名和选择。
For protein monomers, MSAs are produced with three iterative MSA searching pipelines (dMSA, qMSA and mMSA) through whole-genome (Uniclust30 and UniRef90) and metagenome (Metaclust, BFD, Mgnify, TaraDB, MetaSourceDB and JGIclust) sequence databases.
对于蛋白质单体,通过全基因组 (Uniclust30 和 UniRef90) 和宏基因组 (Metaclust、BFD、Mgnify、TaraDB、MetaSourceDB 和 JGIclust) 序列数据库,使用三种迭代的 MSA 搜索流程 (dMSA、qMSA 和 mMSA) 生成 MSAs。
For protein multimers, a number of hybrid MSAs are created by pairing the sequences from monomer MSAs of the component chains, with the optimal multimer MSAs selected based on a combined score of MSA depth and folding score of the monomer chains.
对于蛋白质多体,通过将来自组分链的单体 MSAs 的序列配对,创建了一些混合 MSAs,根据组合得分,包括 MSA 深度和单体链的折叠得分,选择最佳的多体 MSAs。
Large-scale benchmark data show significant advantage of DeepMSA2 in generating accurate MSAs with balanced depth and alignment coverage which are most suitable for deep-learning based protein and protein complex stucture and function predictions.
大规模的基准数据显示,DeepMSA2 在生成准确的 MSAs 方面具有显著的优势,这些 MSAs 具有平衡的深度和比对覆盖度,最适合基于深度学习的蛋白质以及蛋白质复合物结构和功能预测。
DeepMSA2 整体架构:
核心逻辑位于DeepMSA2_noIMG.pl
:
dq_time = time.time()
subprocess.run([f"contact/DeepMSA2/scripts/DeepMSA2_noIMG.pl",
self.seqname, self.datadir, self.pkgdir, self.libdir, python2])
print(f"[Info] dMSA和qMSA步运行时间: {time_elapsed(dq_time, time.time())}")
(1) 运行脚本qMSA2.py
,即qMSA:
$python2 $HHLIB/scripts/qMSA2.py \\
-hhblitsdb=$qhhblitsdb \\
-jackhmmerdb=$qjackhmmerdb \\
-hhblits3db=$qhhblits3db \\
-hmmsearchdb=$qhmmsearchdb \\
-tmpdir=/tmp/$ENV{USER}/$tag \\
-ncpu=$cpu \\
$datadir/MSA/qMSA.fasta
(2) 运行脚本build_MSA.py
,即dMSA,DeepMSA:
$python2 $HHLIB/scripts/build_MSA.py \\
-hhblitsdb=$dhhblitsdb \\
-jackhmmerdb=$djackhmmerdb \\
-hmmsearchdb=$dhmmsearchdb \\
-tmpdir=/tmp/$ENV{USER}/$tag \\
-ncpu=$cpu \\
$datadir/MSA/DeepMSA.fasta
源码位于qMSA2.py
,包括4个部分,即Uniclust30、UniRef90、BFD、MGnify。使用级联搜索,优先以Neff值高的MSA作为索引,最终,选择Neff值最高的作为qMSA。
第1部分:
uniclust30_2017_04
),约64G,时间2017.04,最新是2022.02源码如下:
print("[Info] " + "+" * 100)
q1_time = time.time()
print("[Info] #### run hhblits ####")
hhblits_prefix=os.path.join(tmpdir,"hhblits")
if db_dict["hhblitsdb"]:
c1 = overwrite_dict["hhblits"]
c2 = not os.path.isfile(prefix+".hhbaln.gz")
c3 = not os.path.isfile(prefix+".hhba3m.gz")
print("[Info] 是否运行: {}, 具体条件: {} or {} or {}".format(c1 or c2 or c3, c1, c2, c3))
if c1 or c2 or c3:
# generates hhblits_prefix.a3m hhblits_prefix.aln
# hhblits_prefix.60.a3m hhblits_prefix.60.aln
sys.stdout = open(os.devnull, 'w') # 关闭日志
hhb_nf = run_hhblits(query_fasta, db_dict["hhblitsdb"], ncpu, hhblits_prefix)
plain2gz(hhblits_prefix + ".aln", prefix + ".hhbaln.gz")
plain2gz(hhblits_prefix + ".a3m", prefix + ".hhba3m.gz")
sys.stdout = sys.__stdout__ # 打开日志
else:
gz2plain(prefix + ".hhbaln.gz", hhblits_prefix + ".aln")
gz2plain(prefix + ".hhba3m.gz", hhblits_prefix + ".a3m")
hhb_nf = getNf(hhblits_prefix)
sys.stdout.write("%s and %s exists, skip hhblitsdb\n"%(prefix+".hhbaln",prefix+".hhba3m"))
nf = hhb_nf
print("[Info] hhb_nf = {}, target_nf = {}, passed: {}".format(nf, target_nf, hhb_nf >= target_nf))
if hhb_nf >= target_nf: # 满足条件直接停止
shutil.copyfile(hhblits_prefix+".aln", prefix+".aln")
shutil.copyfile(hhblits_prefix+".a3m", prefix+".a3m")
sys.stdout.write("Final MSA by hhblits with Nf >=%.1f\n" % nf)
if os.environ.get("DITADDUP", "N") == "N":
return nf
print("[Info] #### FINISH run hhblits ####")
print("[Info] 数据库: {}".format(db_dict["hhblitsdb"]))
print("[Info] qMSA 第1步耗时: {} 输出: {}".format((time.time() - q1_time), prefix + ".hhbaln.gz(hhba3m.gz)"))
print("[Info] " + "+" * 100)
Uniclust30:https://wwwuser.gwdg.de/~compbiol/uniclust/2022_02/
第2部分:
uniref90.fasta
),约74G源码如下:
print("[Info] " + "+" * 100)
q2_time = time.time()
print("[Info] #### run jack_hhblits ####")
jackblits_prefix=os.path.join(tmpdir,"jackblits")
if db_dict["jackhmmerdb"]:
c1 = overwrite_dict["jackhmmer"]
c2 = not os.path.isfile(prefix+".jacaln.gz")
c3 = not os.path.isfile(prefix+".jaca3m.gz")
print("[Info] 是否运行: {}, 具体条件: {} or {} or {}".format(c1 or c2 or c3, c1, c2, c3))
if c1 or c2 or c3:
# generates jackblits_prefix.a3m jackblits_prefix.aln
# jackblits_prefix.60.a3m jackblits_prefix.60.aln
sys.stdout = open(os.devnull, 'w') # 关闭日志
jack_nf=run_jackblits(query_fasta, db_dict["jackhmmerdb"].split(':'), ncpu, hhblits_prefix, jackblits_prefix)
plain2gz(jackblits_prefix+".aln",prefix+".jacaln.gz")
plain2gz(jackblits_prefix+".a3m",prefix+".jaca3m.gz")
sys.stdout = sys.__stdout__ # 打开日志
else:
gz2plain(prefix+".jacaln.gz",jackblits_prefix+".aln")
gz2plain(prefix+".jaca3m.gz",jackblits_prefix+".a3m")
jack_nf=getNf(jackblits_prefix)
sys.stdout.write("%s and %s exists, skip jackhmmerdb\n"%(prefix+".jacaln.gz",prefix+".jaca3m.gz"))
nf = max([jack_nf, hhb_nf])
print("[Info] jack_nf = {}, max nf = {}, target_nf = {}, passed: {}"
.format(jack_nf, nf, target_nf, nf >= target_nf))
if jack_nf >= target_nf:
shutil.copyfile(jackblits_prefix+".aln",prefix+".aln")
shutil.copyfile(jackblits_prefix+".a3m",prefix+".a3m")
sys.stdout.write("Final MSA by jackhmmer with Nf >=%.1f\n"%nf)
if os.environ.get("DITADDUP", "N") == "N":
return nf
print("[Info] #### FINISH run jack_hhblits ####")
print("[Info] 数据库: {}".format(db_dict["jackhmmerdb"]))
print("[Info] 是否使用 qMSA 第1步的输出: {}".format(os.path.isfile(hhblits_prefix + ".a3m")))
print("[Info] qMSA 第2步耗时: {} 输出: {}".format((time.time() - q2_time), prefix + ".jacaln.gz(jaca3m.gz)"))
print("[Info] " + "+" * 100)
第3部分:
bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
),1.8T源码如下:
print("[Info] " + "+" * 100)
q3_time = time.time()
print("[Info] #### run hhblits3 ####") # , flush=True)
hhblits3_prefix=os.path.join(tmpdir, "hhblits3")
if db_dict["hhblits3db"]:
c1 = overwrite_dict["hhblits3"]
c2 = not os.path.isfile(prefix+".hh3aln.gz")
c3 = not os.path.isfile(prefix+".hh3a3m.gz")
print("[Info] 是否运行: {}, 具体条件: {} or {} or {}".format(c1 or c2 or c3, c1, c2, c3))
if c1 or c2 or c3:
sys.stdout = open(os.devnull, 'w') # 关闭日志
hh3_nf=run_hhblits3(query_fasta, db_dict["hhblits3db"].split(':'),
ncpu, hhblits_prefix, jackblits_prefix, hhblits3_prefix)
plain2gz(hhblits3_prefix+".aln", prefix+".hh3aln.gz")
plain2gz(hhblits3_prefix+".a3m", prefix+".hh3a3m.gz")
sys.stdout = sys.__stdout__ # 打开日志
else:
gz2plain(prefix+".hh3aln.gz",hhblits3_prefix+".aln")
gz2plain(prefix+".hh3a3m.gz",hhblits3_prefix+".a3m")
hh3_nf=getNf(hhblits3_prefix)
sys.stdout.write("%s and %s exists, skip hhblits3db\n"%(prefix+".hh3aln.gz",prefix+".hh3a3m.gz"))
nf = max([hh3_nf, jack_nf, hhb_nf])
print("[Info] hh3_nf = {}, max nf = {}, target_nf = {}, passed: {}"
.format(hh3_nf, nf, target_nf, hh3_nf >= target_nf))
if hh3_nf >= target_nf:
shutil.copyfile(hhblits3_prefix+".aln",prefix+".aln")
shutil.copyfile(hhblits3_prefix+".a3m",prefix+".a3m")
sys.stdout.write("Final MSA by hhblits3 with Nf >=%.1f\n"%nf)
if os.environ.get("DITADDUP", "N") == "N":
return nf
print("[Info] 数据库: {}".format(db_dict["hhblits3db"]))
print("[Info] 是否使用 qMSA 第1-2步的输出: {}, {}".format(
os.path.isfile(jackblits_prefix+".a3m"), jackblits_prefix+".a3m"))
print("[Info] qMSA 第3步耗时: {} 输出: {}".format((time.time() - q3_time), prefix + ".hh3aln.gz(hh3a3m.gz)"))
print("[Info] " + "+" * 100)
第4部分:
源码如下:
q4_time = time.time()
print("[Info] " + "+" * 100)
print("[Info] #### run hmmsearch ####") # , flush=True)
hmmsearch_prefix = os.path.join(tmpdir,"hmmsearch")
if db_dict["hmmsearchdb"]:
c1 = overwrite_dict["hmmsearch"]
c2 = not os.path.isfile(prefix+".hmsaln.gz")
c3 = not os.path.isfile(prefix+".hmsa3m.gz")
print("[Info] 是否运行: {}, 具体条件: {} or {} or {}".format(c1 or c2 or c3, c1, c2, c3))
if c1 or c2 or c3:
# TODO: 建议使用字典
# 优先处理 jumpstart_profile_prefix,其次处理 query_profile_prefix
query_profile_prefix = jackblits_prefix # 使用最大的NF值
if jack_nf < hhb_nf:
query_profile_prefix = hhblits_prefix
jumpstart_profile_prefix = hhblits3_prefix # 使用最大的NF值
if hh3_nf < jack_nf:
jumpstart_profile_prefix = jackblits_prefix
if jack_nf < hhb_nf:
jumpstart_profile_prefix = hhblits_prefix
# generates hmmsearch_prefix.afq hmmsearch_prefix.hmm
# hmmsearch_prefix.redundant hmmsearch_prefix.nonredundant
# hmmsearch_prefix.aln
sys.stdout = open(os.devnull, 'w') # 关闭日志
hms_nf=search_metaclust(query_fasta, sequence, query_profile_prefix, jumpstart_profile_prefix,
db_dict["hmmsearchdb"].split(':'),ncpu,hmmsearch_prefix)
plain2gz(hmmsearch_prefix+".aln",prefix+".hmsaln.gz")
plain2gz(hmmsearch_prefix+".a3m",prefix+".hmsa3m.gz")
sys.stdout = sys.__stdout__ # 打开日志
print("[Info] 是否使用 qMSA 第1-3步的输出: {}".format(jumpstart_profile_prefix + ".a3m"))
else:
gz2plain(prefix+".hmsaln.gz",hmmsearch_prefix+".aln")
gz2plain(prefix+".hmsa3m.gz",hmmsearch_prefix+".a3m")
sys.stdout.write("%s and %s exists, skip hmmsearchdb\n"%(prefix+".hmsaln.gz",prefix+".hmsa3m.gz"))
hms_nf=getNf(hmmsearch_prefix)
nf = max([hh3_nf, jack_nf, hhb_nf, hms_nf])
print("[Info] NF - hhb_nf(s1): {}, jack_nf(s2): {}, hh3_nf(s3): {}, hms_nf(s4): {}"
.format(hhb_nf, jack_nf, hh3_nf, hms_nf))
print("[Info] hms_nf = {}, max nf = {} (target_nf), passed: {}".format(hms_nf, nf, hms_nf >= nf))
if hms_nf > nf: # hmmsearch replaces jackblits and hhblits result
nf = hms_nf
shutil.copyfile(hmmsearch_prefix+".aln", prefix+".aln")
shutil.copyfile(hmmsearch_prefix+".a3m", prefix+".a3m")
sys.stdout.write("Final MSA by hmmsearch with Nf >=%.1f\n"%nf)
if os.environ.get("DITADDUP", "N") == "N":
return nf
print("[Info] 数据库: {}".format(db_dict["hmmsearchdb"]))
print("[Info] qMSA 第4步耗时: {} 输出: {}".format((time.time() - q4_time), prefix + ".hmsaln.gz(hmsa3m.gz)"))
print("[Info] " + "+" * 100)
最终,根据Neff值,在4组MSA中,选出最终的MSA,作为qMSA.a3m,理论上Neff值越来越高。例如:
NF - hhb_nf(s1): 7.98306, jack_nf(s2): 20.9298, hh3_nf(s3): 22.5877, hms_nf(s4): 25.3418
MSA Len - hhb: 121, jac: 402, hh3: 492, hms: 568
源码如下:
# 选择最优的NF作为qMSA的最终输出
print("[Info] NF - hhb_nf(s1): {}, jack_nf(s2): {}, hh3_nf(s3): {}, hms_nf(s4): {}"
.format(hhb_nf, jack_nf, hh3_nf, hms_nf))
nf = max([hh3_nf, jack_nf, hhb_nf, hms_nf])
print("[Info] hms_nf = {}, max nf = {} (target_nf), passed: {}".format(hms_nf, nf, hms_nf >= nf))
if hms_nf >= nf: # hmmsearch replaces jackblits and hhblits result
nf = hms_nf
shutil.copyfile(hmmsearch_prefix + ".aln", prefix + ".aln")
shutil.copyfile(hmmsearch_prefix + ".a3m", prefix + ".a3m")
sys.stdout.write("[Info] step4 - hmmsearch MSA has %.1f Nf. Output anyway.\n" % hms_nf)
else:
if hh3_nf >= max(jack_nf, hhb_nf):
shutil.copyfile(hhblits3_prefix+".aln", prefix+".aln")
shutil.copyfile(hhblits3_prefix+".a3m", prefix+".a3m")
sys.stdout.write("[Info] step3 - hhblits3 MSA has %.1f Nf. Output anyway.\n"%hh3_nf)
elif jack_nf >= max(hh3_nf, hhb_nf):
shutil.copyfile(jackblits_prefix+".aln", prefix+".aln")
shutil.copyfile(jackblits_prefix+".a3m", prefix+".a3m")
sys.stdout.write("[Info] step2 - jackhmmer MSA has %.1f Nf. Output anyway.\n"%jack_nf)
else: # hhb_nf >= nf:
shutil.copyfile(hhblits_prefix+".aln", prefix+".aln")
shutil.copyfile(hhblits_prefix+".a3m", prefix+".a3m")
sys.stdout.write("[Info] step1 - hhblits MSA has %.1f Nf. Output anyway.\n"%hhb_nf)
# prefix 即 output 的目录
print("[Info] qMSA 执行完成 {}, 输出 {}".format((time.time() - q1_time), prefix+".aln|a3m"))
print("[Info] " + "+" * 100)
Neff,在MSA中,默认高质量阈值是128,公式如下:
源码位于build_MSA.py
,包括4个部分,即Uniclust30、UniRef90、Metaclust&MGnify,最后一步,联合搜索Metaclust&MGnify。
第1部分:与qMSA完全相同,包括输出MSA的Neff值。
第2部分:与qMSA完全相同,包括输出MSA的Neff值。
第3部分,目标就是搜索,第3部分:
源码如下:
print("[Info] " + "+" * 100)
d3_time = time.time()
print("[Info] #### run hmmsearch ####") # , flush=True)
hmmsearch_prefix=os.path.join(tmpdir,"hmmsearch")
if db_dict["hmmsearchdb"]:
c1 = overwrite_dict["hmmsearch"]
c2 = not os.path.isfile(prefix+".hmsaln")
c3 = not os.path.isfile(prefix+".hmsa3m")
c4 = build_hmmsearch_db
print("[Info] 是否运行: {}, 具体条件: {} or {} or {} or {}".format(c1 or c2 or c3 or c4, c1, c2, c3, c4))
if c1 or c2 or c3 or c4:
# generates hmmsearch_prefix.afq hmmsearch_prefix.hmm
# hmmsearch_prefix.redundant hmmsearch_prefix.nonredundant
# hmmsearch_prefix.aln
sys.stdout = open(os.devnull, 'w') # 关闭日志
hms_nf=search_metaclust(query_fasta, sequence,
hhblits_prefix if jack_nf < hhb_nf else jackblits_prefix,
db_dict["hmmsearchdb"].split(':'), ncpu, hmmsearch_prefix)
shutil.copyfile(hmmsearch_prefix + ".aln", prefix + ".hmsaln")
if os.path.isfile(hmmsearch_prefix + ".a3m"):
shutil.copyfile(hmmsearch_prefix + ".a3m", prefix + ".hmsa3m")
sys.stdout = sys.__stdout__ # 打开日志
else:
shutil.copyfile(prefix+".hmsaln",hmmsearch_prefix+".aln")
if os.path.isfile(prefix+".hmsa3m"):
shutil.copyfile(prefix+".hmsa3m",hmmsearch_prefix+".a3m")
sys.stdout.write("[Info] %s exists, skip hmmsearchdb\n"%(prefix+".hmsaln"))
hms_nf = getNf(hmmsearch_prefix)
print("[Info] 数据库: {}".format(db_dict["hmmsearchdb"]))
print("[Info] dMSA 第3步耗时: {} 输出: {}".format((time.time() - d3_time), prefix + ".hmsaln(hmsa3m)"))
print("[Info] " + "+" * 100)
其余,也与qMSA相同,也是根据Neff选择最优的MSA,例如:
NF - hhb_nf(s1): 7.98306, jack_nf(s2): 20.9298, hms_nf(s3): 23.6956
检测是否需要运行IMG:
next if (-f "$datadir/JGI/$DBfasta.cdhit"); # 检查是否需要运行
运行脚本:DeepMSA2_IMG.pl
第1部分:
qhmmsearch
搜索qMSA.hh3aln.gz
DB.fasta.bq.cdhit
、DB.fasta.bq.cdhit.clstr
源码如下:
cp $datadir/MSA/qMSA.hh3aln.gz seq.hh3aln.gz
gzip -df seq.hh3aln.gz
if [ ! -s "seq.hh3aln" ];then
cp $datadir/MSA/DeepMSA.jacaln seq.hh3aln
fi
if [ ! -s "seq.hh3aln" ];then
cp $datadir/MSA/DeepMSA.hhbaln seq.hh3aln
fi
sed = seq.hh3aln |sed 'N;s/\\n/\\t/'|sed 's/^/>/g'|sed 's/\\t/\\n/g'| $HHLIB/bin/qhmmbuild -n aln --amino -O seq.afq --informat afa seq.hmm -
$HHLIB/qhmmer_new/qhmmsearch --cpu 192 -E 10 --incE 1e-3 -A $DBfasta.match --tblout $DBfasta.tbl -o $DBfasta.out seq.hmm $JGI/$DBfasta
$HHLIB/bin/esl-sfetch -f $JGI/$DBfasta $DBfasta.tbl|sed 's/*//g' > $DBfasta.fseqs
$HHLIB/bin/cd-hit -i $DBfasta.fseqs -o $datadir/JGI/$DBfasta.cdhit -c 1 -M 3000
第2部分:常规搜索,类似qMSA和dMSA,输出DeepJGI.aln
、q3JGI.aln
、q4JGI.aln
三个文件。
源码如下:
my $cmd="$HHLIB/scripts/qMSA2.py -hhblitsdb=$hhblitsdb -jackhmmerdb=$jackhmmerdb -hhblits3db=$hhblits3db -hmmsearchdb=$tmpdir/DB.fasta.fseqs -tmpdir=$tmpdir/MSA $datadir/JGI/q4JGI.fasta";
...
my $cmd="$HHLIB/scripts/qMSA2.py -hhblitsdb=$hhblitsdb -jackhmmerdb=$jackhmmerdb -hmmsearchdb=$tmpdir/DB.fasta.fseqs -tmpdir=$tmpdir/MSA $datadir/JGI/q3JGI.fasta";
...
my $cmd="$HHLIB/scripts/build_MSA.py -hhblitsdb=$hhblitsdb -jackhmmerdb=$jackhmmerdb -hmmsearchdb=$tmpdir/DB.fasta.fseqs -tmpdir=$tmpdir/MSA $datadir/JGI/DeepJGI.fasta";
使用官方的AF2 (AlphaFold2),评估MSA,区别是模型只使用model1_1
、调整(relax)1次,因此,只输出1个模型,通过pLDDT进行排序。
具体步骤如下:
参考脚本:
I-TASSERmod/run_AF2_multiMSA.py
thirdparty/alphafold2/run_alphafold_msa_benchmark.sh
thirdparty/alphafold2/run_alphafold_msa_benchmark.py
AF2模型位置位于:thirdparty/alphafold2/run_alphafold_msa_benchmark.py
model_params = data.get_model_haiku_params(
model_name=model_name, data_dir=FLAGS.data_dir)
...
# thirdparty/alphafold2/alphafold/model/data.py
def get_model_haiku_params(model_name: str, data_dir: str) -> hk.Params:
"""Get the Haiku parameters from a model name."""
path = os.path.join(data_dir, 'params', f'params_{model_name}.npz')
with open(path, 'rb') as f:
params = np.load(io.BytesIO(f.read()), allow_pickle=False)
return utils.flat_params_to_haiku(params)
官网:https://github.com/deepmind/alphafold
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。