graphpad做图如何加星号_如何用R画分组柱状图并且添加标准差和显著性标记（星号）？...

weixin_39719078

17453人浏览 · 2020-12-20 22:20:05

weixin_39719078 · 2020-12-20 22:20:05 发布

时间过了这么久，该交一份答案了。ggplot2包是一个图形可视化包，并不带统计分析功能，所以统计学分析需要另外去做。

这里加bar和显著性标识，如果了解ggplot2绘图原理中的图层概念的话，就能明白，无非就是在画完分组柱状图后，根据需要自己用ggplot2在图片上画几条折线，加几个星号而已。

这里我用一种geek一点的方法，类似后期在AI中添加的笨办法

开始画图

首先我们创造一个看起来有差异的数据，为了简化统计过程我用rnorm这个函数生成了一堆正太分布的数据。为了让他们有差异，我也特意调整了不同组的mean值。此外为了方便我先生成了4x21的df，然后用reshap2的melt函数将数据转化成ggplot2需要的格式。

## 加载包

library(reshape2)

library(ggplot2)

library(ggpubr)

library(dplyr)

options(stringsAsFactors = F)

## 制造一个有差异的数据为了演示

root = c(rnorm(7,10),rnorm(7,5),rnorm(7,21)),

leaf = c(rnorm(7,9),rnorm(7,4),rnorm(7,32)),

stem = rnorm(21,8))

y = melt(x,id.vars = "Type")

head(y)

统计学分析

半路出家，统计学的很烂，没有系统的学过，只是follow网上的流程需要的时候拿来用下，应该有错误，希望大佬指出。最简单的例子，用t测验检测同组织内(root，leaf，stem)不同样品(A,B,C)的差异:先看数据分布是否为正太分布,方法参考:【R】正态检验与R语言 - REAY - 博客园www.cnblogs.comt测验，方法参考：R语言-t检验_weixin_38322363的博客-CSDN博客_r t检验blog.csdn.net

variable = as.character(unique(y$variable))

Type = as.character(unique(y$Type))

for (i in 1:length(variable)) {

for (j in 1:length(Type)) {

l = variable[i]

z = Type[j]

b = filter(.data = y, variable == l & Type == z)[,3]

a = shapiro.test(b)

qqnorm(b)

print(c("Test Group:",variable[i],Type[j]))

print(a)

}

## t检验

y$num = rep(1:7,times = 9)

yy = acast(y,num~variable+Type)

tissue = colnames(yy)

for (i in c(1,4,7)) {

t1vs2 = t.test(yy[,i],yy[,i+1])

print(paste(tissue[i],"vs",tissue[i+1]),sep = "")

print(t1vs2)

t1vs3 = t.test(yy[,i],yy[,i+2])

print(paste(tissue[i],"vs",tissue[i+2]),sep = "")

print(t1vs3)

t2vs3 = t.test(yy[,i+1],yy[,i+2])

print(paste(tissue[i+1],"vs",tissue[i+2]),sep = "")

print(t2vs3)

}

##########= 夏皮罗检验结果 =#######################

[1] "Test Group:" "root" "A"

Shapiro-Wilk normality test

data: b

W = 0.96668, p-value = 0.8736

[1] "Test Group:" "root" "B"

Shapiro-Wilk normality test

data: b

W = 0.93642, p-value = 0.6067

[1] "Test Group:" "root" "C"

Shapiro-Wilk normality test

data: b

W = 0.90796, p-value = 0.3819

[1] "Test Group:" "leaf" "A"

Shapiro-Wilk normality test

data: b

W = 0.92647, p-value = 0.5213

[1] "Test Group:" "leaf" "B"

Shapiro-Wilk normality test

data: b

W = 0.83389, p-value = 0.08709

[1] "Test Group:" "leaf" "C"

Shapiro-Wilk normality test

data: b

W = 0.91637, p-value = 0.4417

[1] "Test Group:" "stem" "A"

Shapiro-Wilk normality test

data: b

W = 0.74738, p-value = 0.01191

[1] "Test Group:" "stem" "B"

Shapiro-Wilk normality test

data: b

W = 0.97004, p-value = 0.8987

[1] "Test Group:" "stem" "C"

Shapiro-Wilk normality test

data: b

W = 0.95486, p-value = 0.7736

#################== t检验结果 ==###########################

[1] "root_A vs root_B"

Welch Two Sample t-test

data: yy[, i] and yy[, i + 1]

t = 9.4901, df = 11.016, p-value = 1.231e-06

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

3.72869 5.97999

sample estimates:

mean of x mean of y

10.304207 5.449867

[1] "root_A vs root_C"

Welch Two Sample t-test

data: yy[, i] and yy[, i + 2]

t = -30.73, df = 9.8463, p-value = 4.14e-11

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-11.65330 -10.07455

sample estimates:

mean of x mean of y

10.30421 21.16813

[1] "root_B vs root_C"

Welch Two Sample t-test

data: yy[, i + 1] and yy[, i + 2]

t = -34.869, df = 8.2623, p-value = 2.91e-10

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-16.75204 -14.68449

sample estimates:

mean of x mean of y

5.449867 21.168129

[1] "leaf_A vs leaf_B"

Welch Two Sample t-test

data: yy[, i] and yy[, i + 1]

t = 8.6339, df = 10.48, p-value = 4.377e-06

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

4.055983 6.854194

sample estimates:

mean of x mean of y

9.580255 4.125167

[1] "leaf_A vs leaf_C"

Welch Two Sample t-test

data: yy[, i] and yy[, i + 2]

t = -53.64, df = 9.654, p-value = 2.81e-13

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-22.73581 -20.91380

sample estimates:

mean of x mean of y

9.580255 31.405058

[1] "leaf_B vs leaf_C"

Welch Two Sample t-test

data: yy[, i + 1] and yy[, i + 2]

t = -48.407, df = 7.7859, p-value = 6.147e-11

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-28.58570 -25.97409

sample estimates:

mean of x mean of y

4.125167 31.405058

[1] "stem_A vs stem_B"

Welch Two Sample t-test

data: yy[, i] and yy[, i + 1]

t = -0.60654, df = 10.336, p-value = 0.5572

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-1.6806534 0.9589377

sample estimates:

mean of x mean of y

7.891292 8.252150

[1] "stem_A vs stem_C"

Welch Two Sample t-test

data: yy[, i] and yy[, i + 2]

t = -0.66898, df = 10.349, p-value = 0.5182

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-1.1235095 0.6028203

sample estimates:

mean of x mean of y

7.891292 8.151637

[1] "stem_B vs stem_C"

Welch Two Sample t-test

data: yy[, i + 1] and yy[, i + 2]

t = 0.18554, df = 8.1292, p-value = 0.8573

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-1.145265 1.346291

sample estimates:

mean of x mean of y

8.252150 8.151637

下面来画图转化成画图数据

先画分组柱状图的主体

再添加图层，把标线和点给加上

## 为了方便添加标准差所以计算下均值和标准差

plotdata = data.frame(tissue = rep(c("root","leaf","stem"),each = 3),

sample = rep(c("A","B","C"),times = 3),

mean = apply(yy, 2, mean),

se = apply(yy,2,sd))

ggplot(data = plotdata,aes(x = tissue, y = mean, fill = sample))+

geom_bar(stat = "identity",position = "dodge",)+

geom_errorbar(aes(ymax = mean+se,, ymin = mean-se),

position = position_dodge(0.9), width = 0.15)+##不要下标的话把ymin=mean-se去掉

ylim(0,40)+

geom_segment(x = 0.7, y = 8.817302 +3, xend =0.7 , yend = 32.302470+3 )+

geom_segment(x = 0.7, y = 32.302470+3 ,xend = 1.3, yend = 32.302470+3) +

geom_segment(x = 1.3, y = 32.302470+3 ,xend = 1.3, yend = 32.302470+2) +

annotate("text", x = 1, y= 32.302470+4, label = "***")

时间关系，只画了一个，当然你看代码应该可以理解，就是用geom_segment画线，用annotate画星号，当然不仅可以画星号，多关注y叔的博客，你也可以画表情包

当然你也可以尝试大佬的一步到位的方法：R语言学习笔记--ggplot2一步到位绘制误差线及p-value(或显著性标记)www.jianshu.com

魔乐社区

魔乐社区（Modelers.cn) 是一个中立、公益的人工智能社区，提供人工智能工具、模型、数据的托管、展示与应用协同服务，为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作，由全产业链共同建设、共同运营、共同享有，推动国产AI生态繁荣发展。

更多推荐

小参数・大码力・易部署 | Qwen3.6-27B上线魔乐社区，基于昇腾的部署教程来了

继一周前模型开源发布后，千问再度开源Qwen3.6-27B —— 一个拥有270亿参数的稠密多模态模型，也是社区呼声最高的模型规格。Qwen3.6-27B 依然支持多模态思考与非思考模式，在智能体编程方面达到了旗舰级表现，全面超越前代开源旗舰 Qwen3.5-397B-A17B（总参数397B / 激活参数17B的MoE模型）。作为稠密架构，它无需MoE路由即可部署，是开发者在实用、可广泛部署规模