时间过了这么久,该交一份答案了。ggplot2包 是一个图形可视化包,并不带统计分析功能,所以统计学分析需要另外去做。

这里加bar和显著性标识,如果了解ggplot2绘图原理中的图层概念的话,就能明白,无非就是在画完分组柱状图后,根据需要自己用ggplot2在图片上画几条折线,加几个星号而已。

这里我用一种geek一点的方法,类似后期在AI中添加的笨办法

开始画图

首先我们创造一个看起来有差异的数据,为了简化统计过程我用rnorm这个函数生成了一堆正太分布的数据。为了让他们有差异,我也特意调整了不同组的mean值。此外为了方便我先生成了4x21的df,然后用reshap2的melt函数将数据转化成ggplot2需要的格式。

## 加载包

library(reshape2)

library(ggplot2)

library(ggpubr)

library(dplyr)

options(stringsAsFactors = F)

## 制造一个有差异的数据为了演示

x

root = c(rnorm(7,10),rnorm(7,5),rnorm(7,21)),

leaf = c(rnorm(7,9),rnorm(7,4),rnorm(7,32)),

stem = rnorm(21,8))

y = melt(x,id.vars = "Type")

head(y)

统计学分析

半路出家,统计学的很烂,没有系统的学过,只是follow网上的流程需要的时候拿来用下,应该有错误,希望大佬指出。 最简单的例子,用t测验检测同组织内(root,leaf,stem)不同样品(A,B,C)的差异:先看数据分布是否为正太分布,方法参考:【R】正态检验与R语言 - REAY - 博客园​www.cnblogs.comt测验,方法参考:R语言-t检验_weixin_38322363的博客-CSDN博客_r t检验​blog.csdn.net

variable = as.character(unique(y$variable))

Type = as.character(unique(y$Type))

for (i in 1:length(variable)) {

for (j in 1:length(Type)) {

l = variable[i]

z = Type[j]

b = filter(.data = y, variable == l & Type == z)[,3]

a = shapiro.test(b)

qqnorm(b)

print(c("Test Group:",variable[i],Type[j]))

print(a)

}

}

## t检验

y$num = rep(1:7,times = 9)

yy = acast(y,num~variable+Type)

tissue = colnames(yy)

for (i in c(1,4,7)) {

t1vs2 = t.test(yy[,i],yy[,i+1])

print(paste(tissue[i],"vs",tissue[i+1]),sep = "")

print(t1vs2)

t1vs3 = t.test(yy[,i],yy[,i+2])

print(paste(tissue[i],"vs",tissue[i+2]),sep = "")

print(t1vs3)

t2vs3 = t.test(yy[,i+1],yy[,i+2])

print(paste(tissue[i+1],"vs",tissue[i+2]),sep = "")

print(t2vs3)

}

##########= 夏皮罗检验结果 =#######################

[1] "Test Group:" "root" "A"

Shapiro-Wilk normality test

data: b

W = 0.96668, p-value = 0.8736

[1] "Test Group:" "root" "B"

Shapiro-Wilk normality test

data: b

W = 0.93642, p-value = 0.6067

[1] "Test Group:" "root" "C"

Shapiro-Wilk normality test

data: b

W = 0.90796, p-value = 0.3819

[1] "Test Group:" "leaf" "A"

Shapiro-Wilk normality test

data: b

W = 0.92647, p-value = 0.5213

[1] "Test Group:" "leaf" "B"

Shapiro-Wilk normality test

data: b

W = 0.83389, p-value = 0.08709

[1] "Test Group:" "leaf" "C"

Shapiro-Wilk normality test

data: b

W = 0.91637, p-value = 0.4417

[1] "Test Group:" "stem" "A"

Shapiro-Wilk normality test

data: b

W = 0.74738, p-value = 0.01191

[1] "Test Group:" "stem" "B"

Shapiro-Wilk normality test

data: b

W = 0.97004, p-value = 0.8987

[1] "Test Group:" "stem" "C"

Shapiro-Wilk normality test

data: b

W = 0.95486, p-value = 0.7736

#################== t检验结果 ==###########################

[1] "root_A vs root_B"

Welch Two Sample t-test

data: yy[, i] and yy[, i + 1]

t = 9.4901, df = 11.016, p-value = 1.231e-06

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

3.72869 5.97999

sample estimates:

mean of x mean of y

10.304207 5.449867

[1] "root_A vs root_C"

Welch Two Sample t-test

data: yy[, i] and yy[, i + 2]

t = -30.73, df = 9.8463, p-value = 4.14e-11

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-11.65330 -10.07455

sample estimates:

mean of x mean of y

10.30421 21.16813

[1] "root_B vs root_C"

Welch Two Sample t-test

data: yy[, i + 1] and yy[, i + 2]

t = -34.869, df = 8.2623, p-value = 2.91e-10

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-16.75204 -14.68449

sample estimates:

mean of x mean of y

5.449867 21.168129

[1] "leaf_A vs leaf_B"

Welch Two Sample t-test

data: yy[, i] and yy[, i + 1]

t = 8.6339, df = 10.48, p-value = 4.377e-06

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

4.055983 6.854194

sample estimates:

mean of x mean of y

9.580255 4.125167

[1] "leaf_A vs leaf_C"

Welch Two Sample t-test

data: yy[, i] and yy[, i + 2]

t = -53.64, df = 9.654, p-value = 2.81e-13

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-22.73581 -20.91380

sample estimates:

mean of x mean of y

9.580255 31.405058

[1] "leaf_B vs leaf_C"

Welch Two Sample t-test

data: yy[, i + 1] and yy[, i + 2]

t = -48.407, df = 7.7859, p-value = 6.147e-11

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-28.58570 -25.97409

sample estimates:

mean of x mean of y

4.125167 31.405058

[1] "stem_A vs stem_B"

Welch Two Sample t-test

data: yy[, i] and yy[, i + 1]

t = -0.60654, df = 10.336, p-value = 0.5572

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-1.6806534 0.9589377

sample estimates:

mean of x mean of y

7.891292 8.252150

[1] "stem_A vs stem_C"

Welch Two Sample t-test

data: yy[, i] and yy[, i + 2]

t = -0.66898, df = 10.349, p-value = 0.5182

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-1.1235095 0.6028203

sample estimates:

mean of x mean of y

7.891292 8.151637

[1] "stem_B vs stem_C"

Welch Two Sample t-test

data: yy[, i + 1] and yy[, i + 2]

t = 0.18554, df = 8.1292, p-value = 0.8573

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-1.145265 1.346291

sample estimates:

mean of x mean of y

8.252150 8.151637

下面来画图转化成画图数据

先画分组柱状图的主体

再添加图层,把标线和点给加上

## 为了方便添加标准差所以计算下均值和标准差

plotdata = data.frame(tissue = rep(c("root","leaf","stem"),each = 3),

sample = rep(c("A","B","C"),times = 3),

mean = apply(yy, 2, mean),

se = apply(yy,2,sd))

ggplot(data = plotdata,aes(x = tissue, y = mean, fill = sample))+

geom_bar(stat = "identity",position = "dodge",)+

geom_errorbar(aes(ymax = mean+se,, ymin = mean-se),

position = position_dodge(0.9), width = 0.15)+##不要下标的话把ymin=mean-se去掉

ylim(0,40)+

geom_segment(x = 0.7, y = 8.817302 +3, xend =0.7 , yend = 32.302470+3 )+

geom_segment(x = 0.7, y = 32.302470+3 ,xend = 1.3, yend = 32.302470+3) +

geom_segment(x = 1.3, y = 32.302470+3 ,xend = 1.3, yend = 32.302470+2) +

annotate("text", x = 1, y= 32.302470+4, label = "***")

时间关系,只画了一个,当然你看代码应该可以理解,就是用geom_segment画线,用annotate画星号,当然不仅可以画星号,多关注y叔的博客,你也可以画表情包

当然 你也可以尝试大佬的一步到位的方法:R语言学习笔记--ggplot2一步到位绘制误差线及p-value(或显著性标记)​www.jianshu.com

Logo

魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。

更多推荐