Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot: bizarre issue adding labels in time-series data

Tags:

r

ggplot2

So I was working with panel data for states and localities and uncovered a strange issue plotting the time-series. I was trying to plot each state's data individually in light grey, highlight key states using specific colors, and add a colored label at the end of the plot for the states that I highlighted. I also wanted to include a line for the mean across states. For some reason, the scaling of the variable in question throws the labeling off.

I generated some clunky data below that demonstrates the problem. The labels for the average, for some reason, go haywire with some variables. Any help in this regard would be really useful. I'm just curious why the code works perfectly fine with one variable and not the other. There is no difference between the two sets of code otherwise.


library(tidyverse)

#Creating state labels
state<-c(rep("A",21), rep("B",21), rep("C",21), rep("D",21))

#Creating years for each state
year<-rep(2000:2020, 4)

#Generating each state's population
population_a<-5000:5020
population_b<-population_a+10
population_c<-population_a+20
population_d<-population_a+30
population<-c(population_a, population_b, population_c, population_d)


#Consolidating the data
mydata<-data.frame(state, year, population)

mydata$lnpop<-log(mydata$population)

#PLOTTING TIME-SERIES FOR EACH STATE

#THIS WORKS:

ggplot(data=mydata, aes(year, lnpop)) + 
  geom_line(aes(group=state), colour="gray")+
  geom_text(data=mydata %>% group_by(state) %>% 
              arrange(desc(year)) %>% 
              slice(1) %>% 
              filter(state=="A"),
            aes(x = year+0.3, label=state), colour="purple", hjust=0)+
  geom_text(data=mydata %>% group_by(state) %>% 
              arrange(desc(year)) %>% 
              slice(1) %>% 
              filter(state=="B"),
            aes(x = year+0.3, label=state), colour="red",hjust=0)+
  geom_text(data=mydata %>% group_by(state) %>% 
              arrange(desc(year)) %>% 
              slice(1) %>% 
              filter(state=="D"),
            aes(x = year+0.3, label=state), colour="blue",hjust=0)+
  guides(colour=FALSE) +
  expand_limits(x = max(mydata$year) + 0.3)+
  geom_line(data=subset(mydata, state == "A"), colour="purple")+
  geom_line(data=subset(mydata, state == "B"), colour="red")+
  geom_line(data=subset(mydata, state == "D"), colour="blue")+
  stat_summary(fun = mean, geom = "line") +
  stat_summary(data=subset(mydata, year==max(year)), fun = mean, geom = "text", show.legend = FALSE, hjust=0, aes(x=year+0.05,label="AVG")) +
  xlab("Year")+
  ylab("Population (Logged)")

#BUT THIS DOES NOT:

ggplot(data=mydata, aes(year, population)) + 
  geom_line(aes(group=state), colour="gray")+
  geom_text(data=mydata %>% group_by(state) %>% 
              arrange(desc(year)) %>% 
              slice(1) %>% 
              filter(state=="A"),
            aes(x = year+0.3, label=state), colour="purple", hjust=0)+
  geom_text(data=mydata %>% group_by(state) %>% 
              arrange(desc(year)) %>% 
              slice(1) %>% 
              filter(state=="B"),
            aes(x = year+0.3, label=state), colour="red",hjust=0)+
  geom_text(data=mydata %>% group_by(state) %>% 
              arrange(desc(year)) %>% 
              slice(1) %>% 
              filter(state=="D"),
            aes(x = year+0.3, label=state), colour="blue",hjust=0)+
  guides(colour=FALSE) +
  expand_limits(x = max(mydata$year) + 0.3)+
  geom_line(data=subset(mydata, state == "A"), colour="purple")+
  geom_line(data=subset(mydata, state == "B"), colour="red")+
  geom_line(data=subset(mydata, state == "D"), colour="blue")+
  stat_summary(fun = mean, geom = "line") +
  stat_summary(data=subset(mydata, year==max(year)), fun = mean, geom = "text", show.legend = FALSE, hjust=0, aes(x=year+0.05,label="AVG")) +
  xlab("Year")+
  ylab("Population")

This works

--

enter image description here

EDIT: Spaced out the lines in the plots a bit.

like image 338
sjillani Avatar asked Jan 02 '26 02:01

sjillani


1 Answers

Another workaround using annotate()

library(ggplot2)
library(dplyr)

state<-c(rep("A",21), rep("B",21), rep("C",21), rep("D",21))

#Creating years for each state
year<-rep(2000:2020, 4)

#Generating each state's population
population_a<-5000:5020
population_b<-population_a+2
population_c<-population_a+3
population_d<-population_a+5
population<-c(population_a, population_b, population_c, population_d)


#Consolidating the data
mydata<-data.frame(state, year, population)
sub_dat <- subset(mydata, year==max(year))
ggplot(data=mydata, aes(year, population)) + 
  geom_line(aes(group=state), colour="gray")+
  geom_text(data=mydata %>% group_by(state) %>% 
              arrange(desc(year)) %>% 
              slice(1) %>% 
              filter(state=="A"),
            aes(x = year+0.3, label=state), colour="purple", hjust=0)+
  geom_text(data=mydata %>% group_by(state) %>% 
              arrange(desc(year)) %>% 
              slice(1) %>% 
              filter(state=="B"),
            aes(x = year+0.3, label=state), colour="red",hjust=0)+
  geom_text(data=mydata %>% group_by(state) %>% 
              arrange(desc(year)) %>% 
              slice(1) %>% 
              filter(state=="D"),
            aes(x = year+0.3, label=state), colour="blue",hjust=0)+
  guides(colour=FALSE) +
  expand_limits(x = max(mydata$year) + 0.3)+
  geom_line(data=subset(mydata, state == "A"), colour="purple")+
  geom_line(data=subset(mydata, state == "B"), colour="red")+
  geom_line(data=subset(mydata, state == "D"), colour="blue")+
  stat_summary(fun = mean, geom = "line") +
  annotate("text", 
           x = max(sub_dat$year) + 0.05, y = mean(sub_dat$population), 
           label = "AVG", hjust = 0) +
  xlab("Year")+
  ylab("Population")

Created on 2020-04-16 by the reprex package (v0.3.0)

or set the argument orientation = x in stat_summary() explicitly

This geom treats each axis differently and, thus, can thus have two orientations. Often the orientation is easy to deduce from a combination of the given mappings and the types of positional scales in use. Thus, ggplot2 will by default try to guess which orientation the layer should have. Under rare circumstances, the orientation is ambiguous and guessing may fail. In that case the orientation can be specified directly using the orientation parameter, which can be either "x" or "y". The value gives the axis that the geom should run along, "x" being the default orientation you would expect for the geom.

ggplot(data=mydata, aes(year, population)) + 
  geom_line(aes(group=state), colour="gray")+
  geom_text(data=mydata %>% group_by(state) %>% 
              arrange(desc(year)) %>% 
              slice(1) %>% 
              filter(state=="A"),
            aes(x = year+0.3, label=state), colour="purple", hjust=0)+
  geom_text(data=mydata %>% group_by(state) %>% 
              arrange(desc(year)) %>% 
              slice(1) %>% 
              filter(state=="B"),
            aes(x = year+0.3, label=state), colour="red",hjust=0)+
  geom_text(data=mydata %>% group_by(state) %>% 
              arrange(desc(year)) %>% 
              slice(1) %>% 
              filter(state=="D"),
            aes(x = year+0.3, label=state), colour="blue",hjust=0)+
  guides(colour=FALSE) +
  expand_limits(x = max(mydata$year) + 0.3)+
  geom_line(data=subset(mydata, state == "A"), colour="purple")+
  geom_line(data=subset(mydata, state == "B"), colour="red")+
  geom_line(data=subset(mydata, state == "D"), colour="blue")+
  stat_summary(fun = mean, geom = "line") +
  stat_summary(data=subset(mydata, year==max(year)), fun = mean, geom = "text", show.legend = FALSE, hjust=0, aes(x=year+0.05,label="AVG"), orientation = "x") +
  xlab("Year")+
  ylab("Population (Logged)")

Created on 2020-04-16 by the reprex package (v0.3.0)

like image 161
yang Avatar answered Jan 03 '26 14:01

yang



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!