0

I am trying to create a forest plot to compare estimates of different models. Tho goal is to plot all the estimates of the same predictor for the different models so they are comparable.

For this I have written the following plotting function:


# Create the combined_data dataframe
# My data looks something like this
combined_data <- data.frame(
  Parameter = c("x1", "x2", "x1", "x2", "x3"),
  Coefficient = c(1.78358873, -1.14376132, 1.84360374, -1.22307433, -0.08670885),
  SE = c(0.1070057, 0.1010112, 0.1620157, 0.1982852, 0.1845809),
  CI = c(0.95, 0.95, 0.95, 0.95, 0.95),
  CI_low = c(1.5712121, -1.3442406, 1.5220889, -1.6166156, -0.4530508),
  CI_high = c(1.9959654, -0.9432820, 2.1651185, -0.8295330, 0.2796331),
  p = c(3.151569e-30, 1.923809e-19, 1.269702e-19, 1.591506e-08, 6.395803e-01),
  Model = c("Model 1", "Model 1", "Model 2", "Model 3", "Model 3")
)

# Display the dataframe
print(combined_data)

# ploitng function
plot_model_coefficients <- function(combined_data, plot_title = "Coefficient Plot", 
                                    y_axis_label = "Parameters", x_axis_label = "Coefficient",
                                    dodge = position_dodge(width = 0.6)) {
  # Choose colors based on the number of models
  num_models <- length(unique(combined_data$Model))  # Count of unique models
  point_colors <- RColorBrewer::brewer.pal(n = num_models, name = "Set2")
  
  # Create the plot
  p <- ggplot(combined_data, aes(x = Parameter, y = Coefficient, color = Model, group = Model)) +
    geom_point(size = 3, position = dodge) +  # Points for the Coefficient
    geom_errorbar(aes(ymin = CI_low, ymax = CI_high, color = Model), 
                  width = 0.2, position = dodge) +  # Error bars for CI
    geom_hline(yintercept = 0, color = "black", linetype = "dashed", size = 1) +  # Horizontal line at y=0
    theme_minimal() +  # Minimal theme
    labs(title = plot_title, x = x_axis_label, y = y_axis_label) +
    scale_color_manual(values = point_colors) +  # Custom color for points
    coord_flip() +  # Flip coordinates for better visibility
    theme(legend.title = element_blank())
  
  # Print the plot
  print(p)
}

plot_model_coefficients(combined_data, 
                        plot_title = "Stacked Coefficients Plot", 
                        y_axis_label = "Estimate",
                        x_axis_label = "Predictors")

This creats a plot looking like so:

plot created by my function

However I'd like to create a plot where I can add some information e.g. the CI for every estimate like in this plot:

plot created with forestploter

I tried working with geom_text() but I was only able to add the information directly over the estimates and I dont no how I can create this table like tructure nex to the estimates. I also looked into the forestploterpackage in Rhowever I couldn't figure out how I can group the estimates when the estimates were used in mutiple models.

5
  • 1
    geom_text recognises the x and y aesthetics, so just pass it the coordinates at which you want the text to appear, probably with h_just as well. You'll probably need to do that with a data set other than combined_data. Also, I suggest returning the plot from your function, not printing it from within the function. That makes the function more general. Commented Nov 25 at 11:14
  • 3
    This question is similar to: Assigning geom_text from a different dataframe labels to a graph. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. Commented Nov 25 at 11:16
  • In Assigning geom_text from a different dataframe labels to a graph they are using again the geom_text() to add a text over the single elemts. I dont think you can use it to create a seperat colum to the right of the plot aligned with the estimates unless you are wlling to edit every singel position of the CI text seperately. Commented Nov 25 at 12:53
  • Yes. That is correct. Customisation takes effort. That said, it shouldn't be too dificult. You have control of the y co-ordinate of the labels. The x co-ordinate should be constant. As always, @allancameron's contribution is accurate and informative. Commented Nov 25 at 12:58
  • Yes, you're right. I was hoping for an easy fix... I'll have a go with the geom_text(). Thanks! Commented Nov 25 at 13:09

1 Answer 1

2

It takes quite a lot of placing of graphical elements to get you from a basic ggplot to a labelled forest plot. Your two-level grouping (model and parameter) also means that whatever output you are looking for, it can't be quite the same as the output in the desired image, unless you want a separate row for each combination.

If that's the case (as your question seems to imply), then you could do something like:

library(tidyverse)

combined_data %>%
  mutate(label = paste0(format(round(Coefficient, 2), nsmall = 2),  " (",
                        format(round(CI_low, 2), nsmall = 2), " to ", 
                        format(round(CI_high, 2), nsmall = 2), ")")) %>%
  mutate(ypos = seq(n())) %>%
  ggplot(aes(Coefficient, ypos)) +
  annotate("rect", xmin = -Inf, xmax = Inf, 
           ymin = seq(nrow(combined_data)) - 0.5,
           ymax = seq(nrow(combined_data)) + 0.5,
           fill = rep(c("white", "gray90"), length.out = nrow(combined_data))) +
  annotate("rect", xmin = c(-Inf, 2.5), xmax = c(-2.5, Inf), 
           ymin = -Inf, ymax = Inf, fill = "white") +
  geom_errorbar(aes(xmin = CI_low, xmax = CI_high), width = 0.1) +
  annotate("segment", x = 0, y = -Inf, yend = 5.5, linetype = 2) +
  annotate("segment", y = 0.5, x = -2.5, xend = 2.5, linewidth = 1) +
  geom_text(aes(x = -5, label = paste0(Model, " (parameter ", Parameter, ")")), 
                hjust = 0) +
  geom_text(aes(x = 4, label = label), hjust = 0.5) +
  geom_point() +
  annotate("text", label = c("Model", "Estimate (95% confidence)"), 
           x = c(-5, 4), y = nrow(combined_data) + 1, hjust = c(0, 0.5),
           fontface = "bold") +
  scale_x_continuous("Risk Ratio", limits = c(-5, 5),
                     breaks = seq(-2, 2, 1)) +
  scale_y_continuous(NULL, expand = c(0, 0), limits = c(0.5, 6.5)) +
  theme_bw(14) +
  theme(axis.ticks.length.y = unit(0, "mm"),
        axis.text.y = element_blank(),
        panel.border = element_blank(),
        panel.grid = element_blank())

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

Uff.. no wonder I was struggling. Thanks for the contribution! The thing I liked with the plot I was using before was, that I could cluster the predictors when they were used in multiple models, but then I think this is in conflict with this "segment" approach.
@Linus the thing about ggplot is that ultimately it just allows shortcuts to drawing graphic primitives (lines, text, polygons etc). You can draw anything. So you can have the predictors clustered together if you want. The problem is that you don't seem to have decided how that should look. If you want the zebra stripes, then would each predictor get its own stripe or all be on the same stripe? If all predictors are on the same stripe, some stripes would need multiple confidence labels. Would those stripes be wider than others? The difficulties are in design - the plotting is straightforward

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.