当前位置:   article > 正文

跟着Cell学单细胞转录组分析(十二):转录因子分析

转录因子分析

转录因子分析可以了解细胞异质性背后的基因调控网络的异质性。转录因子分析也是单细胞转录组常见的分析内容,R语言分析一般采用的是SCENIC包,具体原理可参考两篇文章。1、《SCENIC : single-cell regulatory networkinference and clustering》。2、《Ascalable SCENIC workflow for single-cell gene regulatory network analysis》。但是说在前头,SCENIC的计算量超级大,非常耗费内存和时间,如非必要,不要用一般的电脑分析尝试。可以借助服务器完成分析,或者减少分析细胞数,再或者使用SCENIC的Python版本。这里我们也是仅仅进行演示,数据没有实际意义,人为减少了基因与细胞,然而就这也很费时间。重要的是看看流程。

首先开始前,需要做两件事。第一毫无疑问是安装和加载R包,需要的比较多,如果没有请安装。第二则是下载基因注释配套数据库。

  1. library(Seurat)
  2. library(tidyverse)
  3. library(foreach)
  4. library(RcisTarget)
  5. library(doParallel)
  6. library(SCopeLoomR)
  7. library(AUCell)
  8. BiocManager::install(c("doMC", "doRNG"))
  9. library(doRNG)
  10. BiocManager::install("GENIE3")
  11. library(GENIE3)
  12. #if (!requireNamespace("devtools", quietly = TRUE))
  13. devtools::install_github("aertslab/SCENIC")
  14. packageVersion("SCENIC")
  15. library(SCENIC)
  16. #这里下载人的
  17. dbFiles <- c("https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-500bp-upstream-7species.mc9nr.feather",
  18. "https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-tss-centered-10kb-7species.mc9nr.feather")
  19. for(featherURL in dbFiles)
  20. {
  21. download.file(featherURL, destfile=basename(featherURL))
  22. }

接着就是构建分析文件。

  1. #构建分析数据
  2. exprMat <- as.matrix(immune@assays$RNA@data)#表达矩阵
  3. exprMat[1:4,1:4]#查看数据
  4. cellInfo <- immune@meta.data[,c("celltype","nCount_RNA","nFeature_RNA")]
  5. colnames(cellInfo) <- c('CellType', 'nGene' ,'nUMI')
  6. head(cellInfo)
  7. table(cellInfo$CellType)
  8. #构建scenicOptions对象,接下来的SCENIC分析都是基于这个对象的信息生成的
  9. scenicOptions <- initializeScenic(org = "hgnc", dbDir = "F:/cisTarget_databases", nCores = 1)

构建共表达网络,最后一步很费时间。

  1. # Co-expression network
  2. genesKept <- geneFiltering(exprMat, scenicOptions)
  3. exprMat.filtered <- exprMat[genesKept, ]
  4. exprMat.filtered[1:4,1:4]
  5. runCorrelation(exprMat.filtered, scenicOptions)
  6. exprMat.filtered.log <- log2(exprMat.filtered + 1)
  7. runGenie3(exprMat.filtered.log, scenicOptions)
  8. #Using 676 TFs as potential regulators...
  9. #Running GENIE3 part 1
  10. #Running GENIE3 part 10
  11. #Running GENIE3 part 2
  12. #Running GENIE3 part 3
  13. #Running GENIE3 part 4
  14. #Running GENIE3 part 5
  15. #Running GENIE3 part 6
  16. #Running GENIE3 part 7
  17. #Running GENIE3 part 8
  18. #Running GENIE3 part 9
  19. #Finished running GENIE3.
  20. #Warning message:
  21. #In runGenie3(exprMat.filtered.log, scenicOptions) :
  22. # Only 676 (37%) of the 1839 TFs in the database were found in the dataset. Do they use the same gene IDs?

构建基因调控网络GRN并进行AUC评分。也是耗费时间的过程。运行完成的结果就是整个分析得到的内容,需要按照自己的目的去筛选。

  1. # Build and score the GRN
  2. scenicOptions <- runSCENIC_1_coexNetwork2modules(scenicOptions)
  3. scenicOptions <- runSCENIC_2_createRegulons(scenicOptions)
  4. exprMat_log <- log2(exprMat + 1)
  5. scenicOptions <- runSCENIC_3_scoreCells(scenicOptions,exprMat_log)
  6. scenicOptions <- runSCENIC_4_aucell_binarize(scenicOptions)
  7. saveRDS(scenicOptions, file = "int/scenicOptions.Rds")

以下是运行记录

  1. >scenicOptions <- runSCENIC_2_createRegulons(scenicOptions)
  2. 13:21 Step 2. Identifying regulons
  3. tfModulesSummary:
  4. [,1]
  5. top5perTargetAndtop3sd 1
  6. top5perTargetAndtop50 1
  7. top1sdAndtop10perTarget 2
  8. top50perTargetAndtop1sd 2
  9. top50Andtop10perTarget 3
  10. w0.005 27
  11. w0.005Andtop1sd 149
  12. top50perTarget 174
  13. top50Andtop3sd 236
  14. top3sd 436
  15. top50 436
  16. w0.005Andtop50perTarget 500
  17. top1sd 523
  18. top5perTarget 617
  19. top10perTarget 671
  20. w0.001 676
  21. 13:21 RcisTarget: Calculating AUC
  22. Scoring database: [Source file: hg19-500bp-upstream-7species.mc9nr.feather]
  23. Scoring database: [Source file: hg19-tss-centered-10kb-7species.mc9nr.feather]
  24. Not all characters in C:/Users/liuhl/Desktop/1.R could be decoded using CP936. To try a different encoding, choose "File | Reopen with Encoding..." from the main menu.17:17 RcisTarget: Adding motif annotation
  25. Using BiocParallel...
  26. Using BiocParallel...
  27. Number of motifs in the initial enrichment: 1993247
  28. Number of motifs annotated to the matching TF: 22290
  29. 17:38 RcisTarget: Pruning targets
  30. 19:04 Number of motifs that support the regulons: 12551
  31. Preview of motif enrichment saved as: output/Step2_MotifEnrichment_preview.html
  32. There were 13 warnings (use warnings() to see them)
  33. > exprMat_log <- log2(exprMat + 1)
  34. > scenicOptions <- runSCENIC_3_scoreCells(scenicOptions,exprMat_log)
  35. 19:06 Step 3. Analyzing the network activity in each individual cell
  36. Number of regulons to evaluate on cells: 318
  37. Biggest (non-extended) regulons:
  38. ELF1 (1760g)
  39. ETS1 (1734g)
  40. FLI1 (1604g)
  41. ELK3 (1493g)
  42. POLR2A (1453g)
  43. CHD2 (1251g)
  44. ETV3 (1249g)
  45. ELK4 (1148g)
  46. TAF1 (974g)
  47. ERG (956g)
  48. Quantiles for the number of genes detected by cell:
  49. (Non-detected genes are shuffled at the end of the ranking. Keep it in mind when choosing the threshold for calculating the AUC).
  50. min 1% 5% 10% 50% 100%
  51. 205.00 224.76 276.90 321.40 695.00 997.00
  52. Warning in .AUCell_calcAUC(geneSets = geneSets, rankings = rankings, nCores = nCores, :
  53. Using only the first 224.76 genes (aucMaxRank) to calculate the AUC.
  54. 19:07 Finished running AUCell.
  55. 19:07 Plotting heatmap...
  56. 19:07 Plotting t-SNEs...
  57. Warning message:
  58. In max(densCurve$y[nextMaxs]) : max里所有的参数都不存在;回覆-Inf
  59. > scenicOptions <- runSCENIC_4_aucell_binarize(scenicOptions)
  60. Binary regulon activity: 207 TF regulons x 439 cells.
  61. (299 regulons including 'extended' versions)
  62. 168 regulons are active in more than 1% (4.39) cells.
  63. > saveRDS(scenicOptions, file = "int/scenicOptions.Rds")

每一步的分析结果SCENIC都会自动保存在所创建的int和out文件夹。接下来对结果进行可视化,这里是随机选的,没有生物学意义。实际情况是要根据自己的研究目的。

1、可视化转录因子与seurat细胞分群联动

  1. #regulons AUC
  2. AUCmatrix <- readRDS("int/3.4_regulonAUC.Rds")
  3. AUCmatrix <- AUCmatrix@assays@data@listData$AUC
  4. AUCmatrix <- data.frame(t(AUCmatrix), check.names=F)
  5. RegulonName_AUC <- colnames(AUCmatrix)
  6. RegulonName_AUC <- gsub(' \\(','_',RegulonName_AUC)
  7. RegulonName_AUC <- gsub('\\)','',RegulonName_AUC)
  8. colnames(AUCmatrix) <- RegulonName_AUC
  9. scRNAauc <- AddMetaData(immune, AUCmatrix)
  10. scRNAauc@assays$integrated <- NULL
  11. saveRDS(scRNAauc,'immuneauc.rds')
  12. #二进制regulo AUC
  13. BINmatrix <- readRDS("int/4.1_binaryRegulonActivity.Rds")
  14. BINmatrix <- data.frame(t(BINmatrix), check.names=F)
  15. RegulonName_BIN <- colnames(BINmatrix)
  16. RegulonName_BIN <- gsub(' \\(','_',RegulonName_BIN)
  17. RegulonName_BIN <- gsub('\\)','',RegulonName_BIN)
  18. colnames(BINmatrix) <- RegulonName_BIN
  19. scRNAbin <- AddMetaData(immune, BINmatrix)
  20. scRNAbin@assays$integrated <- NULL
  21. saveRDS(scRNAbin, 'immunebin.rds')

作图使用Seurat中FeaturePlot函数。小提琴图也是可以的。

  1. FeaturePlot(scRNAauc, features='CEBPB_extended_1035g', label=T, reduction = 'umap')
  2. FeaturePlot(scRNAbin, features='CEBPB_extended_1035g', label=T, reduction = 'umap')

2、最常见的热图,选择需要可视化的regulons。

  1. library(pheatmap)
  2. celltype = subset(cellInfo,select = 'CellType')
  3. AUCmatrix <- t(AUCmatrix)
  4. BINmatrix <- t(BINmatrix)
  5. regulons <- c('CEBPB_extended_1035g','RUNX1_extended_200g',
  6. 'FOXC1_extended_100g','MYBL1_extended_112g',
  7. 'IRF1_extended_1785g',
  8. 'ELF1_1760g','ELF1_extended_2165g',
  9. 'IRF1_extended_1765g','ETS1_extended_2906g',
  10. 'YY1_extended_1453g','ATF3_extended_1022g',
  11. 'E2F4_extended_637g',
  12. 'KLF2_12g','HES1_13g',
  13. 'GATA3_20g','HOXB2_extended_362g',
  14. 'SOX4_extended_10g',
  15. 'RUNX3_extended_170g','EGR3_extended_23g',
  16. 'MXD4_extended_182g','HOXD9_extended_25g')
  17. AUCmatrix <- AUCmatrix[rownames(AUCmatrix)%in%regulons,]
  18. BINmatrix <- BINmatrix[rownames(BINmatrix)%in%regulons,]
  19. pheatmap(AUCmatrix, show_colnames=F, annotation_col=celltype,
  20. width = 6, height = 5)
  21. pheatmap(BINmatrix, show_colnames=F, annotation_col=celltype,
  22. color = colorRampPalette(colors = c("white","black"))(100),
  23. width = 6, height = 5)


好了,以上是一个基本的流程演示,具体怎么用这个结果,如何解读,可以参考相关的高分文献,了解分析原理,与自己的研究相结合。
更多精彩内容请访问我的个人公众号《KS科研分享与服务》!

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/weixin_40725706/article/detail/131733
推荐阅读
相关标签
  

闽ICP备14008679号