Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify and plot datapoints surrounded by NAs

Tags:

r

na

ggplot2

I am using ggplot2 and geom_line() to make a lineplot of a large number of time series. The dataset has a high number of missing values, and I am generally happy that lines are not drawn across missing segments, as this would look awkard.

My problem is that single non-NA datapoints surrounded by NAs (or points at the beginning/end of the series with an NA on the other side) are not plotted. A potential solution would be adding geom_point() for all observations, but this increases my filesize tenfold, and makes the plot harder to read.

Thus, I want to identify only those datapoints that do not get shown with geom_line() and add points only for those. Is there a straightforward way to identify these points?

My data is currently in long format, and the following MWE can serve as an illustration. I want to identify rows 1 and 7 so that I can plot them:

library(ggplot2)
set.seed(1)
dat <- data.frame(time=rep(1:5,2),country=rep(1:2,each=5),value=rnorm(10))
dat[c(2,6,8),3] <- NA
ggplot(dat) + geom_line(aes(time,value,group=country))

> dat
   time country      value
1     1       1 -0.6264538
2     2       1         NA
3     3       1 -0.8356286
4     4       1  1.5952808
5     5       1  0.3295078
6     1       2         NA
7     2       2  0.4874291
8     3       2         NA
9     4       2  0.5757814
10    5       2 -0.3053884
like image 689
notfound Avatar asked Dec 27 '25 22:12

notfound


1 Answers

You can use zoo::rollapply function to create a new column with values surrended with NA only. Then you can simply plot those points. For example:

library(zoo)
library(ggplot2)

foo <-  data.frame(time =c(1:11), value = c(1 ,NA, 3, 4, 5, NA, 2, NA, 4, 5, NA))

# Perform sliding window processing
val <- c(NA, NA, foo$value, NA, NA) # Add NA at the ends of vector
val <- rollapply(val, width = 3, FUN = function(x){
    if (all(is.na(x) == c(TRUE, FALSE, TRUE))){
        return(x[2])
    } else {
        return(NA)
    }
})

foo$val_clean <- val[c(-1, -length(val))] # Remove first and last values

foo$val_clean

ggplot(foo) + geom_line(aes(time, value)) + geom_point(aes(time, val_clean))

ggplot results

like image 130
Istrel Avatar answered Dec 30 '25 12:12

Istrel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!