赞
踩
library(SNPassoc)
data(SNPs)
SNPs[1:8,1:8]
id | casco | sex | blood.pre | protein | snp10001 | snp10002 | snp10003 |
---|---|---|---|---|---|---|---|
1 | 1 | Female | 13.7 | 75640.52 | TT | CC | GG |
2 | 1 | Female | 12.7 | 28688.22 | TT | AC | GG |
3 | 1 | Female | 12.9 | 17279.59 | TT | CC | GG |
4 | 1 | Male | 14.6 | 27253.99 | CT | CC | GG |
5 | 1 | Female | 13.4 | 38066.57 | TT | AC | GG |
6 | 1 | Female | 11.3 | 9872.46 | TT | CC | GG |
7 | 1 | Female | 11.9 | 11132.90 | TT | AC | GG |
8 | 1 | Male | 12.4 | 29973.43 | TT | AC | GG |
这里比较重要的是,row.names这一列表示ID,里面的数据全是SNP数据
myDat<- SNPs[,-(2:5)]
row.names(myDat) <- myDat$id;
myDat <- myDat[,-1]
myDat[1:5,1:5]
# str(myDat)
myDat <- as.matrix(myDat)
snp10001 | snp10002 | snp10003 | snp10004 | snp10005 |
---|---|---|---|---|
TT | CC | GG | GG | GG |
TT | AC | GG | GG | AG |
TT | CC | GG | GG | GG |
CT | CC | GG | GG | GG |
TT | AC | GG | GG | GG |
Recoding alleles from character/factor/numeric into the number of copies of the minor alleles, i.e. 0, 1 and 2. In codeGeno, in the first step heterozygous genotypes are coded as 1. From the other genotypes, the less frequent genotype is coded as 2 and the remaining genotype as 0.
利用等位基因频率对基因型进行转化,多的纯合体为0,杂合为1,少的纯合体为2
library(synbreed)
cp <- create.gpData(geno = myDat)
cp.dat <- codeGeno(gpData = cp,label.heter = "alleleCoding", maf = 0.01, nmiss = 0.1,
impute = TRUE, impute.type = "random", verbose = TRUE)
step 1 : 1 marker(s) removed with > 10 % missing values
step 2 : Recoding alleles
step 4 : 12 marker(s) removed with maf < 0.01
step 7 : Imputing of missing values
step 7d : Random imputing of missing values
step 8 : No recoding of alleles necessary after imputation
step 9 : 0 marker(s) removed with maf < 0.01
step 10 : No duplicated markers removed
End : 22 marker(s) remain after the check
Summary of imputation
total number of missing values : 37
number of random imputations : 37
write.csv(myDat,"snps.csv")
ge <- read.csv("snps.csv",header = T,row.names = 1,na.strings = "NA")
summary(ge)
ge <- as.matrix(ge)
gp <- create.gpData(geno = ge)
cp.dat <- codeGeno(gpData = gp,label.heter = "alleleCoding", maf = 0.01, nmiss = 0.1,
impute = TRUE, impute.type = "random", verbose = TRUE)
snp10001 snp10002 snp10003 snp10004 snp10005 snp10006 snp10007 snp10008 CC:12 AA: 5 GG :144 GG :156 AA: 3 AA:157 CC:157 CC:104 CT:53 AC:78 NA's: 13 NA's: 1 AG:70 CG: 44 TT:92 CC:74 GG:84 GG: 9 snp10009 snp100010 snp100011 snp100012 snp100013 snp100014 snp100015 AA :72 TT :147 CC: 1 CC : 3 AA :101 AA :27 AG: 13 AG :79 NA's: 10 CG: 2 CG :68 AG : 35 AC :74 GG:144 GG : 5 GG:154 GG :84 GG : 9 CC :52 NA's: 1 NA's: 2 NA's: 12 NA's: 4 snp100016 snp100017 snp100018 snp100019 snp100020 snp100021 snp100022 GG :152 CC : 5 CC : 5 CC:32 AA: 9 GG:157 AA :156 NA's: 5 CT :83 CT :84 CG:75 AG: 43 NA's: 1 TT :67 TT :67 GG:50 GG:105 NA's: 2 NA's: 1 snp100023 snp100024 snp100025 snp100026 snp100027 snp100028 snp100029 AA : 5 CC :14 CC:157 GG :156 CC :68 CC :34 AA :14 AT :78 CT :51 NA's: 1 CG :82 CT :72 AG :48 TT :71 TT :91 GG : 5 TT :50 GG :94 NA's: 3 NA's: 1 NA's: 2 NA's: 1 NA's: 1 snp100030 snp100031 snp100032 snp100033 snp100034 snp100035 AA:157 TT :102 AA :34 AA :34 CC :14 TT :146 NA's: 55 AG :70 AG :69 CT :48 NA's: 11 GG :52 GG :49 TT :94 NA's: 1 NA's: 5 NA's: 1 step 1 : 1 marker(s) removed with > 10 % missing values step 2 : Recoding alleles step 4 : 12 marker(s) removed with maf < 0.01 step 7 : Imputing of missing values step 7d : Random imputing of missing values step 8 : No recoding of alleles necessary after imputation step 9 : 0 marker(s) removed with maf < 0.01 step 10 : No duplicated markers removed End : 22 marker(s) remain after the check Summary of imputation total number of missing values : 37 number of random imputations : 37
gee <- cp.dat$geno
gee[1:5,1:5]
snp10001 | snp10002 | snp10005 | snp10008 | snp10009 | |
---|---|---|---|---|---|
1 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 1 | 1 | 0 | 1 |
3 | 0 | 0 | 0 | 0 | 0 |
4 | 1 | 0 | 0 | 0 | 0 |
5 | 0 | 1 | 0 | 0 | 1 |
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。