赞
踩
参考前文:R绘图笔记 | R语言绘图系统与常见绘图函数及参数
关于绘图,前面介绍了一些:
R绘图笔记 | 多数据系列的箱型图与带抖动散点的多数据系列箱型图
这里介绍火山图的绘制,火山图常常出现在芯片、转录组、蛋白组、代谢组等组学检测技术的结果中,并且通常伴随热图一起出现,文章中也很常见。
一.数据处理
如果你想获取该数据用于自己练习,下面是获取数据的地址:
https://docs.qq.com/sheet/DV3ZwZWl5UURjUGJi
数据格式是这样的。
二.绘图
1.读入数据
- data <- read.csv("BioInfoNotesData4.csv",stringsAsFactors = F)
- head(data)
我们画火山图,只需要其中的log2FC和FDR就可以了。在绘图之前,我们需要对FDR进行转换,将它的值变成-log10,如果有0,会产生Inf,需要去除。这样的话可以拉开差异表达基因之间的间距。在纵坐标上才可以很好的显示出来。上面的数据已经处理了。后面只需要取相应的列就行。
- data <- data[,c(1,3,7)]
- data$logP <- -log10(data$FDR)
- colnames(data) <- c("Gene","Log2FC","FDR","logP")
- head(data)
2.绘图
这里先介绍ggscatter函数绘图。
- ggscatter(data, x, y, combine = FALSE, merge = FALSE,
- color = "black", fill = "lightgray", palette = NULL, shape = 19,
- size = 2, point = TRUE, rug = FALSE, title = NULL, xlab = NULL,
- ylab = NULL, facet.by = NULL, panel.labs = NULL,
- short.panel.labs = TRUE, add = c("none", "reg.line", "loess"),
- add.params = list(), conf.int = FALSE, conf.int.level = 0.95,
- fullrange = FALSE, ellipse = FALSE, ellipse.level = 0.95,
- ellipse.type = "norm", ellipse.alpha = 0.1,
- ellipse.border.remove = FALSE, mean.point = FALSE,
- mean.point.size = ifelse(is.numeric(size), 2 * size, size),
- star.plot = FALSE, star.plot.lty = 1, star.plot.lwd = NULL,
- label = NULL, font.label = c(12, "plain"), font.family = "",
- label.select = NULL, repel = FALSE, label.rectangle = FALSE,
- cor.coef = FALSE, cor.coeff.args = list(), cor.method = "pearson",
- cor.coef.coord = c(NULL, NULL), cor.coef.size = 4, ggp = NULL,
- show.legend.text = NA, ggtheme = theme_pubr(), ...)
data:就是绘图的数据,x、y分别指定轴变量。palette是调色板,其实很多参数和之前的图形绘制是相通的。我不介绍,你自己查看一下帮助文档,后面用到的参数我会介绍。
ggscatter(data,x="Log2FC",y="logP") + theme_test()
但是这样太丑了,我们还是对数据分组,然后处理一下。一是高低表达的差异显示出来,二是突出极显著或者差异倍数大的基因。
- data$GeneLab <- ""
- sigDownGene <- data$Gene[(data$Log2FC < -5 & data$FDR < 0.005)]
- sigUpGene <- data$Gene[(data$Log2FC > 4&data$FDR < 0.001)]
- sigGene <- c(sigDownGene,sigUpGene)
- data$GeneLab[match(sigGene,data$Gene)] <- sigGene
- head(data)
在图中要显示点的名称,我们可以通过label设置,但点那么多,我只显示自己想要的,可以增加一列,不显示的全都是空值,要显示的有值(这里是基因名称),repel参数避免要显示的基因名称重叠,不好看。
- ggscatter(data,x="Log2FC",y="logP",color="Group",
- shape = 16,
- label = data$GeneLab,
- font.label=7,
- xlab = "log2(Fold Change)",
- ylab = "-log(pValue)",
- #rug= T,
- repel=T,
- size = 1) + theme_test()
我还是觉得丑,可通过theme设置主题。
ggscatter(data,x="Log2FC",y="logP",color="Group", shape = 16, palette = c("#7B68EE", "#E0E0E0", "#FF4500"), label = data$GeneLab, font.label=7, xlab = "log2(Fold Change)", ylab = "log(pValue)", #rug= T, repel=T, size = 1)+ theme(legend.title = element_blank(), legend.text = element_text(size = 8, face = "bold"), legend.margin = margin(t = 0, r = 0, b = 0, l = 0, unit = "pt"), legend.direction = "horizontal", legend.position = c(0.5,0.93), panel.background = element_rect(fill = "transparent",colour = "black"), plot.background = element_rect(fill = "transparent",colour = "black"))
除了上面的函数以外,有一些包里面自带的火山图绘制函数,有的图也很不错。比如TCGAbiolinks包,这个包是用来分析TCGA数据库的数据的,很好用,该包的TCGAVisualize_volcano函数绘制的火山图也很好看。
- library(TCGAbiolinks)##
- volFig <- pdf("vol.pdf",width=10,height=6)
- TCGAVisualize_volcano(data$Log2FC, data$FDR,title = "",
- filename = volFig, xlab = "logFC",
- names = data$Gene, show.names = "highlighted",
- x.cut = 1, y.cut = 0.01,
- highlight = data$Gene[which(abs(data$Log2FC) >=4.5)],
- highlight.color = "orange")
- dev.off()
这些绘图函数都是基于ggplot2包的,所以可以自定义很多元素。
- volFig <- pdf("vol.pdf",width=10,height=8)
- TCGAVisualize_volcano(data$Log2FC, data$FDR,title = "",
- filename = volFig, xlab = "logFC",
- names = data$Gene, show.names = "highlighted",
- x.cut = 1, y.cut = 0.01,
- highlight = data$Gene[which(abs(data$Log2FC) >=4.5)],
- highlight.color = "orange") +
- theme(legend.title = element_blank(),
- legend.text = element_text(size = 8, face = "bold"),
- legend.margin = margin(t = 0, r = 0, b = 0, l = 0, unit = "pt"),
- legend.direction = "horizontal",
- legend.position = c(0.2,0.93),
- panel.background = element_rect(fill = "transparent",colour = "black"),
- plot.background = element_rect(fill = "transparent",colour = "black"))
- dev.off()
参考资料:
数据来源:TCGA数据库
ggscatter帮助文档
TCGAVisualize_volcano帮助文档
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。