Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot line graph with NA values

Tags:

r

na

ggplot2

I'm having with trouble with ggplot trying to plot 2 incomplete time series on the same graph where the y data does not have the same values on the x-axis (year) - NAs are thus present for certain years :

test<-structure(list(YEAR = c(1937, 1938, 1942, 1943, 1947, 1948, 1952, 
1953, 1957, 1958, 1962, 1963, 1967, 1968, 1972, 1973, 1977, 1978, 
1982, 1983, 1986.5, 1987, 1993.5), A1 = c(NA, 24, NA, 32, 32, 
NA, 34, NA, NA, 18, 12, NA, 10, NA, 11, NA, 15, NA, 24, NA, NA, 
25, 26), A2 = c(40, NA, 38, NA, 25, NA, 26, NA, 20, NA, 17, 
17, 17, NA, 16, 18, 21, 18, 17, 25, NA, NA, 26)), .Names = c("YEAR", "A1", 
"A2"), row.names = c(NA, -23L), class = "data.frame")

The following code I tried outputs a disjointed mess :

ggplot(test, aes(x=YEAR)) + 
  geom_line(aes(y = A1), size=0.43, colour="red") +  
  geom_line(aes(y = A2), size=0.43, colour="green") +
  xlab("Year") + ylab("Percent") +
  scale_x_continuous(limits=c(1935, 1995), breaks = seq(1935, 1995, 5),
                     expand = c(0, 0)) + 
  scale_y_continuous(limits=c(0,50), breaks=seq(0, 50, 10), expand = c(0, 0))

enter image description here

How can I solve this problem?

like image 301
user2568648 Avatar asked Feb 27 '15 21:02

user2568648


2 Answers

My preferred solution would be to reshape this to long format. Then you only need 1 geom_line call. Especially if you have many series, that's tidier. Same result as LyzandeR's 2nd chart.

library(ggplot2)
library(reshape2)

test2 <- melt(test, id.var='YEAR')
test2 <- na.omit(test2)

ggplot(test2, aes(x=YEAR, y=value, color=variable)) + 
  geom_line() +
  scale_color_manual(values=c('red', 'green')) +

  xlab("Year") + ylab("Percent") +
  scale_x_continuous(limits=c(1935, 1995), breaks = seq(1935, 1995, 5),
                     expand = c(0, 0)) + 
  scale_y_continuous(limits=c(0,50), breaks=seq(0, 50, 10), expand = c(0, 0))

enter image description here

You might consider adding a geom_point() call in addition to the line, so it's clear which points are real values and which are missing. Another advantage to the long format is that additional geoms take just 1 call each, as opposed to 1 per series each.

enter image description here

like image 122
arvi1000 Avatar answered Sep 27 '22 17:09

arvi1000


You can remove them with na.omit:

library(ggplot2)
#use na.omit below
ggplot(na.omit(test), aes(x=YEAR)) + 
  geom_line(aes(y = A1), size=0.43, colour="red") +  
  geom_line(aes(y = A2), size=0.43, colour="green") +
  xlab("Year") + ylab("Percent") +
  scale_x_continuous(limits=c(1935, 1995), breaks = seq(1935, 1995, 5),
                     expand = c(0, 0)) + 
  scale_y_continuous(limits=c(0,50), breaks=seq(0, 50, 10), expand = c(0, 0))

enter image description here

EDIT

Using 2 separate data.frames with na.omit:

#test1 and test2 need to have the same column names
test1 <- test[1:2]
test2 <- tes[c(1,3)]
colnames(test2) <- c('YEAR','A1')

library(ggplot2)
ggplot(NULL, aes(y = A1, x = YEAR)) + 
  geom_line(data = na.omit(test1), size=0.43, colour="red") +  
  geom_line(data = na.omit(test2), size=0.43, colour="green") +
  xlab("Year") + ylab("Percent") +
  scale_x_continuous(limits=c(1935, 1995), breaks = seq(1935, 1995, 5),
                     expand = c(0, 0)) + 
  scale_y_continuous(limits=c(0,50), breaks=seq(0, 50, 10), expand = c(0, 0))

enter image description here

like image 31
LyzandeR Avatar answered Sep 27 '22 17:09

LyzandeR