赞
踩
用计算机替代人预测复杂事件的影响,是我们这个时代最令人兴奋的科学进展之一。SIFT就是这样一个应用于基因组学研究的经典工具。
SIFT可预测多种生物体的基因组变异,主要是错义突变的影响与效应,最大的特点是物种丰富,是一个经典的、普适性的研究工具。
网址1 (官方)
http://sift-dna.org (偶尔进不去时换个时间或浏览器访问)
开发单位
(1) 新加坡基因组研究所,计算和系统生物学
(2) 美国克雷格·文特尔研究所,基因组医学
克雷格·文特尔研究所 (J.Craig Venter Institute)成立于2006年10月。通过合并TIGR、TCAG和奎格文特科学基金会等,并为生物能源替代 (IBEA)研究所。
克雷格·文特尔,美国生物学家,被很多人称为生物学界的“坏小子”,曾经公然挑战 “国际人类基因组计划”,并用霰弹枪法为基因测序。来源: Baidu
预测原理
SIFT根据序列同源性和氨基酸的物理特性,预测氨基酸的取代是否影响蛋白质功能。可应用于自然发生的非同义突变 (多态性)和实验室诱导的错义突变。
引用文献
SIFT web server: predicting effects of amino acid substitutions on proteins. Ngak-Leng Sim, Prateek Kumar, Jing Hu, Steven Henikoff, Georg Schneider, Pauline C. Ng. Nucleic Acids Research, Volume 40, Issue W1, 1 July 2012, Pages W452–W457, https://doi.org/10.1093/nar/gks539 (文章下载链接:https://pan.baidu.com/s/1ky9fh0HCuht0M9ubkasK1w 提取码:7bhe)
网址2 (代表性物种预测)
https://sift.bii.a-star.edu.sg/www/SIFT4G_vcf_submit.html
首先需要查看自己研究的生物体是否具有SIFT数据库,再注释变异文件 (VCF)。对于常见生物可在线提交VCF文件。
物种范围
少量具有代表性的动物、植物、真菌、原生生物、原核生物(只有大肠杆菌)。
输入文件
VCF文件 (8th column "INFO" required) ,大小<5M
提交一个人类的VCF文件 (后文会提交其它物种)
在线预测界面
提交VCF文件
国内SIFT在线预测的体验不是很好,可能由于网络原因。等待时间比较长、或直接"趴窝"。本篇后文会介绍本地预测的方法,体验较好。
网址3 (扩展的SIFT 4G,涉及哪些物种)
https://sift.bii.a-star.edu.sg/sift4g/
需要查看自己研究的生物体是否具有SIFT数据库,再注释变异文件 (VCF)。
SIFT Databases
如果您研究的物种没有被下表列出,可以创建自己的SIFT预测数据库。
Common Name | Scientific Name |
African bush elephant (非洲丛林象) | Loxodonta africana |
African malaria mosquito | Anopheles gambiae |
African rice | Oryza glumaepatula |
Alpaca | Vicugna pacos |
Amebiasis protozoan parasite * | Entamoeba histolytica |
Amborella trichopoda | Amborella trichopoda |
American pika** | Ochotona princeps |
Anthracnose fungus (炭疽菌) | Colletotrichum gloeosporioides |
Arabidopsis (拟南芥) | Arabidopsis thaliana |
Asian rice (亚洲稻) | Oryza sativa |
Aspergillus (曲霉菌) | Aspergillus clavatus |
Aspergillus | Aspergillus flavus |
Aspergillus | Aspergillus fumigatus |
Aspergillus | Aspergillus nidulans |
Aspergillus | Aspergillus niger |
Aspergillus | Aspergillus terreus |
Atlantic cod | Gadus morhua |
Bakarae and foot rot disease fungus | Fusarium fujikuroi |
Barley | Hordeum vulgare |
Barrel clover | Medicago truncatula |
Black cottonwood | Populus trichocarpa |
Blackleg fungus | Leptosphaeria maculans |
Bigelowiella natans** | Bigelowiella natans |
Blind cave tetra | Astyanax mexicanus |
Blood fluke* | Schistosoma mansoni |
Bottlenose dolphin** | Tursiops truncatus |
Bovine | Bos taurus |
Brown bread rice (糙米) | Oryza rufipogon |
Cat | Felis catus |
Campion anther smut | Microbotryum violaceum |
Candida lipolytica | Yarrowia lipolytica |
Carolina anole | Anolis carolinensis |
Chicken | Gallus gallus |
Chinese cabbage | Brassica rapa |
Chinese softshell turtle | Pelodiscus sinensis |
Chimpanzee | Pan troglodytes |
Collared flycatcher | Ficedula albicollis |
Comb jelly | Mnemiopsis leidyi |
Common marmoset | Callithrix jacchus |
Common shrew** | Sorex araneus |
Crucifer anthracnose fungus | Colletotrichum higginsianum |
Cucumber anthracnose fungus | Colletotrichum orbiculare |
Diplogastrid nematode | Pristionchus pacificus |
Dog | Canis familiaris |
Dothistroma needle blight | Dothistroma septosporum |
E.coli | Escherichia coli |
Encapsulated yeast* | Cryptococcus neoformans |
Eremothecium gossypii | Ashbya gossypii |
European centipede | Strigamia maritima |
European hedgehog | Erinaceus europaeus |
Eye worm | Loa loa |
Ferret (雪貂) | Mustela putorius furo |
Filarial nematode worm* | Brugia malayi |
Fission yeast (裂变酵母) | Schizosaccharomyces japonicus |
Fission yeast | Schizosaccharomyces cryophilus |
Fission yeast | Schizosaccharomyces octosporus |
Fission yeast | Schizosaccharomyces pombe |
Fly | Drosophila ananassae |
Fly | Drosophila erecta |
Fly | Drosophila grimshawi |
Fly | Drosophila melanogaster |
Fly | Drosophila mojavensis |
Fly | Drosophila persimilis |
Fly | Drosophila pseudoobscura |
Fly | Drosophila sechellia |
Fly | Drosophila simulans |
Fly | Drosophila virilis |
Fly | Drosophila willistoni |
Fly | Drosophilia yakuba |
Foxtail millet | Setaria_italica |
Freshwater leech | Helobdella robusta |
Fusarium vascular wilt | Fusarium oxysporum |
Gaint panda | Ailuropoda melanoleuca |
Gemmiferous Spikemoss | Selaginella moellendorffii |
Gorilla | Gorilla gorilla |
Grape seed | Vitis vinifera |
Green alga* | Chlamydomonas reinhardtii |
Green Monkey | Chlorocebus_sabaeus |
Grey mouse lemur | Microcebus murinus |
Grey short-tailed opossum | Monodelphis domestica |
Guinea pig | Cavia porcellus |
Guillardia theta** | Guillardia theta |
Hoffmann's two-toed sloth | Choloepus hoffmanni |
Honey bee | Apis mellifera |
Horse | Equus caballus |
Human | Homo sapiens |
Humpbacked fly | Megaselia scalaris |
Indian rice | Oryza indica |
Indian wild rice* | Oryza nivara |
Japanese rice fish | Oryzias latipes |
Jewel wasp | Nasonia vitripennis |
Kangaroo rat** | Dipodomys ordii |
Kentucky bluegrass fungus | Magnaporthe poae |
Large flying fox** | Pteropus vampyrus |
Leaf cutter ant | Atta cephalotes |
Lesser hedgehog tenrec** | Echinops telfairi |
Little brown bat | Myotis lucifugus |
Lyre-leaved rock-cress | Arabidopsis lyrata |
Maize (玉米) | Zea mays |
Maize ear and stalk rot fungus | Gibberella moniliformis |
Maize anthracnose fungus | Glomerella graminicola |
Maize head smut fungus* | Sporisorium reilianum |
Maize smut* | Ustilago maydis |
Malaria parasite* | Plasmodium falciparum |
Malaria parasite* | Plasmodium vivax |
Monarch Butterfly** | Danaus plexippus |
Mosquito | Anopheles darlingi |
Mountain Pine Beetle | Dendroctonus ponderosae |
Mouse | Mus musculus |
Mycobacterium tuberculosis (结核杆菌) | Mycobacterium tuberculosis |
Mycosphaerella graminicola | Zymoseptoria tritici |
Necrotrophic fungal pathogen | Pyrenophora teres |
Nematode | Onchocerca_volvulus |
Neosartorya fischeri | Neosartorya fischeri |
Nile tilapia | Oreochromis niloticus |
Nine banded armadillo | Dasypus novemcinctus |
Noble rot fungus | Botryotinia fuckeliana |
Northern greater galago | Otolemur garnettii |
Northern white-cheeked gibbon | Nomascus leucogenys |
Orangutan | Pongo abelii |
Oryza_meridionalis (南方野生稻) | Oryza meridionalis |
Owl limpet** | Lottia gigantea |
Pacific transparent sea squirt | Ciona savignyi |
Pacific oyster** | Crassostrea gigas |
Parasite* | Leishmania major |
Peach | Prunus persica |
Perigord black truffle | Tuber melanosporum |
Phaeodactylum tricornutum Bohlin | Phaeodactylum tricornutum |
Philippine tarsier** | Tarsius syrichta |
Pig | Sus scrofa |
Placozoan multicellular animal | Trichoplax adhaerens |
Plant pathogen* | Albugo laibachii |
Plant pathogen | Nectria haematococca |
Plant pathogen* | Pythium irregulare |
Platypus | Ornithorhynchus anatinus |
Polychaete worm** | Capitella teleta |
Poplar leaf rust fungus | Melampsora laricipopulina |
Postman butterfly | Heliconius melpomene |
Potato | Solanum tuberosum |
Potato late blight fungus | Phytophthora infestans |
Powdery mildew | Blumeria graminis |
Primate malaria parasite* | Plasmodium knowlesi |
Puffer fish | Takifugu rubripes |
Purple false brome | Brachypodium distachyon |
Rabbit | Oryctolagus cuniculus |
Rat | Rattus norvegicus |
Red bread mold | Neurospora crassa |
Red flour mite | Tribolium castaneum |
Red imported file ant | Solenopsis invicta |
Red spider mite | Tetranychus urticae |
Rhesus macaque | Macaca mulatta |
Rice blast fungus | Magnaporthe oryzae |
Rock hyrax | Procavia capensis |
Round worm* | Caenorhabditis brenneri |
Round worm* | Caenorhabditis briggsae |
Round worm* | Caenorhabditis remanei |
Round worm | Caenorhabditis elegans |
Sea anemone | Nematostella vectensis |
Sea lamprey | Petromyzon marinus |
Sea squirt | Ciona intestinalis |
Sheep | Ovis aries |
Silkworm | Bombyx mori |
Slime mold | Dictyostelium discoideum |
Snow-rot disease causing pathogen* | Pythium iwayamai |
Sorghum | Sorghum bicolor |
Southern house mosquito | Culex quinquefasciatus |
Southern platyfish | Xiphophorus maculatus |
Soybean | Glycine max |
Soybean stem and root rot agent* | Phytophthora sojae |
Spotted gar | Lepisosteus oculatus |
Spotted green pufferfish | Tetraodon nigroviridis |
Stem rust fungus* | Puccinia_graminis |
Tammar wallaby | Macropus eugenii |
Tasmanian devil | Sarcophilus harrisii |
Termite | Zootermopsis nevadensis |
Thirteen lined ground squirrel | Ictidomys tridecemlineatus |
Three spine stickleback | Gasterosteus aculeatus |
Tomato | Solanum lycopersicum |
Toxoplasmosis protozoan parasite* | Toxoplasma gondii |
Treeshew** | Tupaia belangeri |
Trichinosis causing parasite** | Trichinella spiralis |
Trichoderma virens | Trichoderma virens |
Trichoderma reesei | Trichoderma reesei |
Trypanosomiasis parasite* | Trypanosoma brucei |
Verticillium wilt | Verticillium dahlia |
Water flea* | Daphnia pulex |
West Indian ocean coelacanth | Latimeria chalumnae |
Western clawed frog | Xenopus tropicalis |
Wheat | Triticum urartu |
Wheat and barley crown-rot fungus | Fusarium pseudograminearum |
Wheat and barley take-all root rot fungus | Gaeumannomyces graminis |
Wheat head blight fungus | Gibberella zeae |
Wheat fungal pathogen | Phaeosphaeria nodorum |
Wheat leaf rust** | Puccinia triticina |
Wheat tan spot fungus | Pyrenophora triticirepentis |
White mold | Sclerotinia sclerotiorum |
Wild duck | Anas platyrhynchos |
Wild turkey | Meleagris gallopavo |
Yeast | Komagataella pastoris |
Yeast | Saccharomyces cerevisiae |
Yellow fever mosquito | Aedes aegypti |
Yellow koji mold (黄曲霉菌) | Aspergillus oryzae |
Zebra finch | Taeniopygia guttata |
Zebra fish | Danio rerio |
* 预测的假阳性高 (High false positive error)
** 预测的覆盖度低
网址4 (多物种、功能增强版的SIFT)
Annotate variants with SIFT 4G
https://sift.bii.a-star.edu.sg/sift4g/AnnotateVariants.html
注:
1. SIFT 4G是其更快版本,能更大规模、为更多物种提供错义突变的有害性预测。
2. VCF文件必须按染色体和位置排序才能正确注释。
3. 要下载到该物种的SIFT数据库 (与bwa、GATK和snpEff等程序使用相同的特定菌株的基因组版本,以及一致的染色体表示方式)。
在Linux命令行完成预测 (略)
https://sift.bii.a-star.edu.sg/sift4g/Commandline.html
由于VCF文件是所有样本合并后的 (gVCF),因此不太需要在Linux中做批处理。关注后续推文。
在Windows本地完成预测 (Mac略)
1. 下载某物种的SIFT4G数据库
https://sift.bii.a-star.edu.sg/sift4g/public
如: 结核分枝杆菌
https://sift.bii.a-star.edu.sg/sift4g/public/Mycobacterium_tuberculosis/
2. 下载本地软件
如果下载不了,反复多试几次,并留意是否被浏览器拦截:
https://github.com/pauline-ng/SIFT4G_Annotator/raw/master/SIFT4G_Annotator.jar
3. 设置java到环境变量,更正:下图第5步应下拉,加到Path中
进入高级系统设置
java环境变量设置,以在"Git bash"或"cmd"中启动java
4. 用"java -jar"运行"SIFT4G_Annotator.jar"
进入"SIFT4G_Annotator.jar"文件所在的文件夹,鼠标右键启动"Git bash"。(或在Windows的cmd命令行写代码,注意正确的文件路径)
在当前目录中打开"Git bash"程序
5. 输入以下命令 (用"java -jar"运行"SIFT4G_Annotator.jar" )
- java -version # 查看环境变量中的java版本
- # java version "1.8.0_202"
- # Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
- # Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)
-
-
- java -jar SIFT4G_Annotator.jar # 启动本地版SIFT
自动弹出java图形界面
java命令行启动图形界面
6. 读取文件和数据库
文件读取
7. 保存本地SIFT预测结果:
结果保存
结果文件存放在上一层目录中,即"../SIFT4G_results" (与工作目录平级)。
8. 预测前、后的文件对比
预测前VCF文件的变异行数:3559 = 3606-47
预测后VCF文件的变异行数:3559 = 3608-49
VCF头文件多出两行:
1. ##SIFT_Threshold: 0.05
2. ##INFO=<ID=SIFTINFO,Number=.,Type=String,Description="SIFT information. Format: Allele|Transcript|GeneId|GeneName|Region|VariantType|Ref_Amino_Acid/Alt_AminoAcid|Amino_position|SIFT_score|SIFT_median|SIFT_num_seqs|Allele_Type|SIFT_prediction">
SIFT注释文字插入到了INFO列的末尾:
DELETERIOUS: 有毒的、有害的突变
查询环形密码子表, Q-Gln / K-Lys
SIFT使用总结
不再赘述,如下图:
SIFT评估突变有害性的工作流程
获取本文的全部测试数据
链接:https://pan.baidu.com/s/1-bMjndANtjiKtLMXEIs3xw
提取码:ysx3 (Author: 宋红卫)
基因突变与脑瘫风险(Nature Genetic,2020)
IF>10 家系研究 | OGDHL变异导致神经发育谱系疾病,表现为癫痫、听力与视力障碍等
— 分析平台 —
— 理论与技术培训 —
— 遗传咨询 —
— 政策法规 —
高中学历父亲自学基因编辑,看五六百篇论文,自制药用级化合物救治罕见病儿子!
欢迎咨询全固态大型云服务器租用
1周内完成家系变异生信分析,尽快推进下游分析
更适合家系全外显子组
若有服务器亦可免费技术咨询,提供专业解答
一/二代测序、临床基因组/外显子组/转录组、遗传学分析
系统性培训,一次学会终身会分析,只待新病例
服务器免费1个月,每日答疑,足以完成小家系分析
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。