禁止商业或二改转载,仅供自学使用,侵权必究,如需截取部分内容请后台联系作者!
介绍
有人问我能不能复现它:
但实际上只能复现出来这样的:
介绍
这段代码是一个用于生成复杂数据可视化的R脚本,主要结合了circos
环形图和ggplot2
条形图/箱线图,以展示模拟的受试者信息。它通过模拟数据生成了一个包含年龄、BMI、性别、种族和IRIS状态的可视化图表,旨在清晰地展示每个受试者的关键信息。
加载必要包
代码首先加载了一系列R包,包括tidyverse
用于数据处理和可视化,circlize
用于生成环形图,gghalves
用于绘制半小提琴图,readxl
和openxlsx
用于读取和写入Excel文件,cowplot
用于组合多个图形,以及magick
用于处理图像。
生成模拟数据
代码通过设置随机种子确保结果可重复,并定义了100个受试者的信息。每个受试者有以下属性:
subject_id_random
:随机生成的受试者ID。adjusted_age
:调整后的年龄,通过正态分布随机生成,均值为55,标准差为10。BMI
:体重指数,通过正态分布随机生成,均值为27,标准差为4。Gender
:性别,随机生成男性或女性。Ethnicity
:种族,随机生成亚洲人、黑人、白人、西班牙裔或未知,概率分别为0.1、0.1、0.5、0.2和0.1。IRIS
:IRIS状态,随机生成IS、IR或未知,概率分别为0.4、0.4和0.2。
设置颜色映射
为了在图表中区分不同的变量和类别,代码定义了各种颜色映射。例如,年龄和BMI分别使用了紫色和橙色,性别使用了粉色和黄色,种族使用了不同的颜色,IRIS状态也分别指定了颜色。
数据整理
代码对模拟数据进行了整理,按调整后的年龄对受试者进行排序,并添加了用于绘图的辅助列。
定义绘图函数
代码定义了多个绘图函数,用于生成环形图和条形图/箱线图:
plot_circos_track
:用于绘制连续变量(如年龄和BMI)的环形轨道。plot_categorical_track
:用于绘制分类变量(如性别、种族和IRIS状态)的环形轨道。plot_box_dot
:用于绘制箱线图和点图,展示连续变量的分布。plot_bar
:用于绘制条形图,展示分类变量的分布。
创建 circos plot 并导出为 PDF
代码使用circos
包创建了一个环形图,展示了每个受试者的年龄、BMI、性别、种族和IRIS状态。每个变量都被绘制在不同的环形轨道上,使用了之前定义的颜色映射。最终的环形图被保存为PDF文件。
创建 ggplot 图形
代码使用ggplot2
包创建了多个条形图和箱线图,分别展示了年龄、BMI、性别、种族和IRIS状态的分布。这些图形使用了与环形图一致的颜色映射。
拼图合成并导出最终图像
代码使用magick
包读取了之前保存的环形图PDF文件,并将其与条形图/箱线图组合在一起。最终的组合图形被保存为PDF文件。
加载R包
library(tidyverse)
library(circlize)
library(gghalves)
library(readxl)
library(openxlsx)
library(cowplot)
library(magick)
模拟数据
set.seed(123)
n <- 100
subject_info <- tibble(
subject_id_random = paste0("Z", stringr::str_pad(1:n, 5, pad = "0")),
adjusted_age = round(rnorm(n, mean = 55, sd = 10)),
BMI = round(rnorm(n, mean = 27, sd = 4), 1),
Gender = sample(c("Male", "Female"), n, replace = TRUE),
Ethnicity = sample(c("Asian", "Black", "Caucasian", "Hispanics", "Unknown"),
n, replace = TRUE, prob = c(0.1, 0.1, 0.5, 0.2, 0.1)),
IRIS = sample(c("IS", "IR", "Unknown"), n, replace = TRUE, prob = c(0.4, 0.4, 0.2))
)
设置颜色映射
age_color <- '#631879FF'
bmi_color <- '#FF410DFF'
sex_color <- c("Female" = '#F2300F', "Male" = '#E1BD6D')
ethnicity_color <- c("Caucasian" = '#ECCBAE', "Asian" = '#0B775E', "Hispanics" = '#D69C4E',
"Black" = '#81A88D', "Unknown" = 'grey')
iris_color <- c("IS" = '#F21A00', "IR" = '#74A089', "Unknown" = 'grey')
数据整理
df <- subject_info %>%
arrange(adjusted_age) %>%
mutate(
x = 1, y = 1,
factors = factor(subject_id_random, levels = subject_id_random)
)
定义绘图函数
# 连续变量环形轨道
plot_circos_track <- function(temp_value, color, ylab, ylim_mult = c(0.8, 1.1), track_height = 0.2) {
circos.track(
factors = df$factors,
y = temp_value,
ylim = range(temp_value, na.rm = TRUE) * ylim_mult,
bg.border = "black",
track.height = track_height,
panel.fun = function(x, y) {
name <- get.cell.meta.data("sector.index")
i <- get.cell.meta.data("sector.numeric.index")
xlim <- get.cell.meta.data("xlim")
ylim <- get.cell.meta.data("ylim")
if (i == 1) {
circos.yaxis(side = "left",
at = c(
ceiling(0.8 * min(temp_value, na.rm = TRUE)),
round(mean(range(temp_value, na.rm = TRUE)), 0),
round(max(temp_value, na.rm = TRUE), 0)),
sector.index = get.all.sector.index()[1],
labels.cex = 0.7)
}
circos.lines(x = mean(xlim), y = temp_value[i], type = "h", col = color, lwd = 2)
if (ylab == "Age") {
circos.text(x = 1, y = max(temp_value, na.rm = TRUE) * 1.1,
labels = name, facing = "clockwise", niceFacing = TRUE, cex = 0.5)
}
circos.points(x = mean(xlim), y = temp_value[i], pch = 16, cex = 0.8, col = color)
}
)
}
# 分类变量轨道
plot_categorical_track <- function(temp_var, colors, track_height = 0.1) {
temp_var[is.na(temp_var)] <- "Unknown"
mapped_colors <- colors[temp_var]
circos.track(
factors = df$factors,
y = df$y,
ylim = c(0, 1),
bg.border = "black",
track.height = track_height,
panel.fun = function(x, y) {
i <- get.cell.meta.data("sector.numeric.index")
xlim <- get.cell.meta.data("xlim")
ylim <- get.cell.meta.data("ylim")
circos.rect(xleft = xlim[1], ybottom = ylim[1],
xright = xlim[2], ytop = ylim[2],
col = mapped_colors[i], bg.border = "black")
}
)
}
# 左侧:箱线图 + 点图
plot_box_dot <- function(data, color, binwidth) {
ggplot(data.frame(class = "class", value = data), aes(x = class, y = value)) +
geom_boxplot(outlier.shape = NA) +
geom_dotplot(binaxis = "y", color = color, fill = color,
shape = 16, binwidth = binwidth, stackdir = "center") +
theme_bw() +
labs(x = "", y = "") +
scale_x_discrete(expand = expansion(mult = c(0, 0))) +
theme(panel.grid = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank())
}
# 左侧:条形图
plot_bar <- function(data, colors, levels_order) {
ggplot(data.frame(class = "class", value = data), aes(x = class)) +
geom_bar(aes(fill = factor(value, levels = levels_order)),
color = "black", position = "stack", show.legend = FALSE, width = 2) +
scale_fill_manual(values = colors) +
theme_bw() +
labs(x = "", y = "") +
scale_y_continuous(expand = expansion(mult = c(0, 0))) +
theme(panel.grid = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank())
}
创建 circos plot 并导出为 PDF
pdf("circos_plot.pdf", width = 8, height = 8)
circos.clear()
circos.par(start.degree = 90, clock.wise = TRUE, gap.after = c(rep(0, nrow(df)-1), 90))
circos.initialize(factors = df$factors, x = df$x, xlim = c(0.5, 1.5))
plot_circos_track(df$adjusted_age, age_color, "Age")
plot_circos_track(df$BMI, bmi_color, "BMI")
plot_categorical_track(df$Gender, sex_color)
plot_categorical_track(df$Ethnicity, ethnicity_color)
plot_categorical_track(df$IRIS, iris_color)
dev.off()
创建 ggplot 图形
p_age <- plot_box_dot(df$adjusted_age, age_color, 1)
p_bmi <- plot_box_dot(df$BMI, bmi_color, 0.6)
p_sex <- plot_bar(df$Gender, sex_color, c("Female", "Male"))
p_ethnicity <- plot_bar(df$Ethnicity, ethnicity_color, c("Asian", "Black", "Caucasian", "Hispanics"))
p_iris <- plot_bar(df$IRIS, iris_color, c("IR", "IS", "Unknown"))
拼图合成并导出最终图像
circos_img <- image_read_pdf("circos_plot.pdf")
final_plot <- ggdraw() +
draw_image(circos_img, x = 0.35, y = 0, width = 0.65, height = 1) +
draw_plot(plot_grid(p_age, p_bmi, p_sex, p_ethnicity, p_iris,
ncol = 5, align = "h"),
x = 0, y = 0, width = 0.35, height = 1)
ggsave("combined_circos_ggplots.pdf", final_plot, width = 12, height = 8)
总结
通过模拟数据生成了一个复杂的可视化图表,结合了环形图和条形图/箱线图,展示了受试者的年龄、BMI、性别、种族和IRIS状态。通过精心设计的数据处理和图形样式定义,代码成功地将受试者的关键信息以一种清晰、直观的方式呈现出来。环形图和条形图/箱线图的结合使用,使得读者能够从不同的角度理解数据。这种可视化的图表对于医学研究和临床实践中的数据展示具有重要意义。
系统信息
R version 4.4.3 (2025-02-28)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.5
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Asia/Shanghai
tzcode source: internal
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] magick_2.8.6 cowplot_1.1.3 openxlsx_4.2.8 readxl_1.4.5 gghalves_0.1.4 circlize_0.4.16 lubridate_1.9.4 forcats_1.0.0
[9] stringr_1.5.1 purrr_1.0.4 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1 tidyverse_2.0.0 dplyr_1.1.4 ggplot2_3.5.1
loaded via a namespace (and not attached):
[1] generics_0.1.3 shape_1.4.6.1 stringi_1.8.4 hms_1.1.3 magrittr_2.0.3 timechange_0.3.0
[7] cellranger_1.1.0 zip_2.3.2 GlobalOptions_0.1.2 scales_1.3.0 textshaping_1.0.0 cli_3.6.4
[13] pdftools_3.5.0 rlang_1.1.5 munsell_0.5.1 withr_3.0.2 tools_4.4.3 tzdb_0.5.0
[19] colorspace_2.1-1 vctrs_0.6.5 R6_2.6.1 lifecycle_1.0.4 ragg_1.3.3 pkgconfig_2.0.3
[25] pillar_1.10.1 gtable_0.3.6 glue_1.8.0 Rcpp_1.0.14 systemfonts_1.2.1 tidyselect_1.2.1
[31] rstudioapi_0.17.1 farver_2.1.2 labeling_0.4.3 qpdf_1.3.5 compiler_4.4.3 askpass_1.2.1