Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining time trend plot with timeline

I want to create a plot (preferable using ggplot2) where I visualize a timeline together with a time-trend plot.

To put it in a practical example, I have aggregated unemployment rates for each year. I also have a data set denoting important legislation changes that are related to the labor market. Hence, I want to create a timeline where the unemployment rate is shown following the same x-axis (time).

I have generated some toy-data, see code below:

set.seed(2110)
year <- c(1950:2020)
unemployment <- rnorm(length(year), 0.05, 0.005)
un_emp <- data.frame(cbind(year, unemployment))


year <- c( 1957, 1961, 1975, 1976, 1983, 1985, 1995, 1999, 2011, 2018)
events <- c("Implemented unemployment benefit", 
            "Pre-school became free", 
            "Five-day workweek were introduced", 
            "Labor law reform 1976", 
            "Unemployment benefit were cut in half", 
            "Apprenticeship Act allows on-the-job training",
            "Changes in discrimination law",
            "Equal Pay for Equal Work was", 
            "9 weeks vacation were introduced",
            "Unemployment benefit were removed")

imp_event  <- data.frame(year, events)

I can easily plot the time-trend across the years:

library(tidyverse)
                      
ggplot(data = un_emp, aes(x = year, y = unemployment)) + 
  geom_line(color = "#FC4E07", size = 0.5) +
  theme_bw()

Time trend.

But how do I include the events (found in imp_event) in the plot in a nice and efficient way? How can I do this?

My aim is to make a timeline looking like the one from here but to combine it with the time-trend plot shown above. How can I do this?

enter image description here

I have tried to use vline but I cannot add the label of the event.

Thanks!

like image 800
ecl Avatar asked Jun 17 '21 15:06

ecl


People also ask

How to add a trendline to a two-series timeline chart?

Here’s the data in a two-series timeline chart: To add a trendline to a series, right click on it and select Add Trendline. Here’s the chart with a trendline for each series.

How to plot a regression line from multiple time periods?

The solution generally entails grouping the data by the desired time period, then grouping the data again by sub-category. After grouping the data, use the Graph Objects library and a second add trace with a for-loop. Then, within each loop generate data and plot data for a regression line.

How to create time series plots with regression trend lines in pandas?

In Brief: Create time series plots with regression trend lines by leveraging Pandas Groupby (), for-loops, and Plotly Scatter Graph Objects in combination with Plotly Express Trend Lines. Data: Counts of things or different groups of things by time.

How to plot data to show change over time?

Change over time is progressive, and this is something you must show in your charts. Therefore, as you consider plotting data, decide on the order your chart bars will follow. Usually, data analysts prefer to have the longest bar at the beginning with the shortest one at the end.


3 Answers

I think this should do the trick:

First, I created the axis with hline, using the mean you set for the data as the y intercept. Then I added a variable "height" to the events' dataframe, which takes the value of the axis and adds a value drawn from a normal distribution. I used this to draw the segments that create the lines towards each point. Finally, I inverted the y position of the year label so it's always in the opposite side of the segment.

library(tidyverse)

set.seed(2110)
year <- c(1950:2020)
unemployment <- rnorm(length(year), 0.05, 0.005)
un_emp <- data.frame(cbind(year, unemployment))

year <- c( 1957, 1961, 1975, 1976, 1983, 1985, 1995, 1999, 2011, 2018)
events <- c("Implemented unemployment benefit", 
            "Pre-school became free", 
            "Five-day workweek were introduced", 
            "Labor law reform 1976", 
            "Unemployment benefit were cut in half", 
            "Apprenticeship Act allows on-the-job training",
            "Changes in discrimination law",
            "Equal Pay for Equal Work was", 
            "9 weeks vacation were introduced",
            "Unemployment benefit were removed")

imp_event  <- data.frame(year, events) %>% 
  mutate(height = mean(unemployment) + rnorm(n(), 0, 0.02))

    ggplot(un_emp) +
  
  geom_hline(yintercept = 0.05) +
  
  geom_line(aes(x = year,
                y = unemployment),
            color = "red",
            alpha = 0.3,
            size = 1) +
  
  geom_segment(data = imp_event,
               aes(x = year,
                   xend = year,
                   y = 0.05,
                   yend = height)) +
  
  geom_text(data = imp_event,
            aes(label = year, 
                x = year,
                y = 0.05 + 0.002 * sign(0.05 - height)), 
            angle = 90, 
            size = 3.5, 
            fontface = "bold",
            check_overlap = T) +
  
  geom_point(data = imp_event,
             aes(x = year,
                 y = height,
                 fill = as.factor(events)),
             shape = 21,
             size = 4) +
  
  scale_x_continuous(name = NULL, 
                     labels = NULL) +
  
  scale_fill_discrete(name = "Event") +
  
  scale_y_continuous(name = "Unemployment Rate") +
  
  theme_bw() + 
  
  theme(panel.border = element_blank(),
        axis.line.y  = element_line(),
        axis.ticks.x = element_blank(),
        panel.grid = element_blank(),
        legend.position="bottom")

enter image description here

like image 113
Eduardo Avatar answered Oct 19 '22 10:10

Eduardo


I worked with Jon Spring's solution but replaced geom_segment with geom_vline which gave a result close to what I wanted. The final code looked like this:


joined_data <- un_emp %>% left_join(imp_event, by = "year")

ggplot(data = joined_data, aes(x = year, y = unemployment)) + 
  geom_line(color = "red", size = 0.5) +

  theme_classic() +
  labs(y = "Unemployment rate", 
       x = "Years", 
       caption = "Data from XXXX") +
  geom_vline(data = joined_data %>% filter(!is.na(events)),  aes(xintercept = year), color = "gray70",  linetype = "dashed") +   
  ggrepel::geom_text_repel(data = joined_data, aes(x = year, y = unemployment-0.03, label = str_wrap(events, 10)), color = "gray70", direction = "y", size = 2.5, lineheight = 0.7, point.padding = 0.8)

Which produces the following plot: enter image description here

I want to reward @Jon Spring the bounty but not sure how I reward a comment.

like image 26
ecl Avatar answered Oct 19 '22 08:10

ecl


You can achieve this by overlaying a geom_text() call, but that requires the x and y values to be the same length as in the other plot so you can't just feed it a new df and overlay that.

Instead, you can achieve what you want by doing a left_join from un_emp to imp_events on year. Because there is only one row per year in imp_events you'll be left with a majority of missing values for events in the df which is perfect as I suspect you only want each event to appear as a label once.

For example:

joined_data <- un_emp %>% left_join(imp_event, by = "year")

ggplot(data = joined_data, aes(x = year, y = unemployment)) + 
  geom_line(color = "#FC4E07", size = 0.5) +
  geom_text(data = joined_data, aes(x = year, y = unemployment, label = (events), size = 3)) +
  theme_bw() 

Which gives you something like this:

enter image description here

You can have a look at the available options and play around with geom_text() here.

like image 32
C.Robin Avatar answered Oct 19 '22 09:10

C.Robin