bmi_exp_dat <- clump_data(bmi_exp_dat,clump_r2=0.01,pop = "EUR")
Please look at vignettes for options on running this locally if you need to run many instances of this command.
Clumping C5nTuK, 5340156 variants, using EUR population reference
Error in api_query("ld/clump", query = list(rsid = dat[["rsid"]], pval = dat[["pval"]], :
The query to MR-Base exceeded 300 seconds and timed out. Please simplify the query
Alternatively, you could try downloading the VCF from IEU OpenGWAS project and using the gwasvcf to extract based on p-value, and use the ieugwasr package to do clumping locally. We're trying to get to a point where it's easy to do heavier computation on these data locally.
LD reference files listed in the gwasvcf page
Wrapper for clump function using local plink binary and ld reference dataset |
ld_clump_local(dat, clump_kb, clump_r2, clump_p, bfile, plink_bin)
dat | Dataframe. Must have a variant name column ("variant") and pval column called "pval". If id is present then clumping will be done per unique id. |
clump_kb | Clumping kb window. Default is very strict, 10000 |
clump_r2 | Clumping r2 threshold. Default is very strict, 0.001 |
clump_p | Clumping sig level for index variants. Default = 1 (i.e. no threshold) |
bfile | If this is provided then will use the API. Default = NULL |
plink_bin | Specify path to plink binary. Default = NULL. See https://github.com/explodecomputer/plinkbinr for convenient access to plink binaries |
- devtools::install_github("explodecomputer/plinkbinr")
- library(plinkbinr)
- get_plink_exe()
- #[1] "D:/R-4.1.1/library/plinkbinr/bin/plink_Windows.exe"
wget http://fileserve.mrcieu.ac.uk/ld/1kg.v3.tgz
ld_clump( dplyr::tibble(rsid=dat$rsid, pval=dat$pval, id=dat$trait_id), plink_bin = genetics.binaRies::get_plink_binary(), bfile = "/path/to/reference/EUR" )
- b <- ld_clump(
- dplyr::tibble(rsid=a$rsid, pval=a$p, id=a$id),
- #get_plink_exe()
- plink_bin = "D:/R-4.1.1/library/plinkbinr/bin/plink_Windows.exe",
- #欧洲人群参考基因组位置
- bfile = "D:/EUR_ref/EUR"
- )
注意a的列名,必须要有:(with the following columns:)
然后就完成了,可以看到跟在线方法去除的LD SNP个数是一样的
- expo_dat <- expo_dat[which(expo_dat$pval.exposure<1e-5),]
- b <- ld_clump(
- dplyr::tibble(rsid=expo_dat$SNP, pval=expo_dat$pval.exposure, id=expo_dat$id.exposure),
- #plink位置
- plink_bin = "/GM_GWAS_LD_clumped_snps/plink",
- bfile = "/GM_GWAS_LD_clumped_snps/EUR_ref/EUR",
- clump_kb = 1000,clump_r2 = 0.1
- )
- expo_dat <- expo_dat[which(expo_dat$SNP %in% b$rsid),]
- Clumping A9o7Sb, 50 variants, using EUR population reference
好多人出现了这个问题,我今天做LD matrix也遇到了,来解决一下,报错信息是找不到文件,我考虑是tmp文件在程序运行过程中被自动删除了。尝试手动添加tmp文件名
- ld_matrix_local <- function(variants, bfile, plink_bin, with_alleles=TRUE)
- {
- # Make textfile
- shell <- ifelse(Sys.info()['sysname'] == "Windows", "cmd", "sh")
- fn <- tempfile()
- ld_matrix_local <- function(variants, bfile, plink_bin, with_alleles=TRUE)
- {
- # Make textfile
- shell <- ifelse(Sys.info()['sysname'] == "Windows", "cmd", "sh")
- fn <- "~/plinkLD/tmpfile/tmp"
但是提示里面发现Using up to 39 threads (change this with --threads),考虑可能是线程冲突,导致文件没有产生,所以在源代码加入线程变量
- fun2 <- paste0(
- shQuote(plink_bin, type=shell),
- " --bfile ", shQuote(bfile, type=shell),
- " --extract ", shQuote(fn, type=shell),
- " --r square ",
- " --keep-allele-order ",
- " --threads 1 ",
- " --out ", shQuote(fn, type=shell)
- )
所以原函数加一行代码就好 --threads 1 ,这个线程1好像速度也蛮快的,不懂有什么影响
