Interpolating NA's by group using dplyr on multiple columns

Tags:

I have a data frame like this:

> head(df1)
  iso year var1 var2 var3
1 XXX 2005  165   29 2151
2 XXX 2006  160   21 2139
3 XXX 2007   NA   NA   NA
4 XXX 2008  184    9 3640
5 XXX 2009   NA   NA   NA
6 YYY 2005  206  461 8049

I want to replace the NA's of intermittent years based on the years around it and the NA's in years at the beginning and end of the range by carrying backward and forward the outer most non-NA observation.

My code to do this for one column is:

df1 %>% 
 group_by(iso) %>%
 mutate(var1 = na.approx(var1, na.rm = FALSE, rule = 1)) %>%
 mutate(var1 = na.locf(var1, na.rm = FALSE)) %>%
 mutate(var1 = na.locf(var1, na.rm = FALSE, fromLast = TRUE))

This works, so now I want to do this for all columns in one go (there are more than 3 and they are not numbered like in my example). This I pieced together from the answers to this question. I omitted the two calls to na.locf.

columnnames <- c("var1, "var2", "var3")
df1 %>%
 group_by(iso) %>%
 mutate_at(.vars = vars(columnnames), .funs = funs(na.approx(., na.rm = FALSE, rule = 1)))

This throws me an error and a warning:

Error in approx(x[!na], y[!na], xout, ...) : need at least two non-NA values to interpolate In addition: Warning message: In xy.coords(x, y, setLab = FALSE) : NAs introduced by coercion

I think I understand the error, but I did not get it when I used the first piece of code on var1. The warning I don't follow. How cal I apply my code to all columns in my data frame? I also tried putting evertything in a loop, looping over columnnames but that didn't work either (and it it probably not the best way to go about this).

664

asked Jul 13 '18 18:07

avs

2 Answers

Use na.approx with method = "constant" (same as na.locf) and rule = 2 (means extend nearest value to leading and trailing NAs). If you want the NAs to be linearly interpolated instead remove the method="constant" argument.

df1 %>%
  group_by(iso) %>%
  mutate_at(vars(-iso), funs(na.approx(., method = "constant", rule = 2))) %>%
  ungroup

giving:

# A tibble: 6 x 5
  iso    year  var1  var2  var3
  <fct> <dbl> <dbl> <dbl> <dbl>
1 XXX    2005   165    29  2151
2 XXX    2006   160    21  2139
3 XXX    2007   160    21  2139
4 XXX    2008   184     9  3640
5 XXX    2009   184     9  3640
6 YYY    2005   206   461  8049

Note

df1 in reproducible form is:

df1 <- 
structure(list(iso = structure(c(1L, 1L, 1L, 1L, 1L, 2L), .Label = c("XXX", 
"YYY"), class = "factor"), year = c(2005L, 2006L, 2007L, 2008L, 
2009L, 2005L), var1 = c(165L, 160L, NA, 184L, NA, 206L), var2 = c(29L, 
21L, NA, 9L, NA, 461L), var3 = c(2151L, 2139L, NA, 3640L, NA, 
8049L)), class = "data.frame", row.names = c("1", "2", "3", "4", 
"5", "6"))

193

answered Oct 06 '22 01:10

G. Grothendieck

You can re-write your code using mutate_at so that conversion can be done in one go as:

library(dplyr)
library(zoo)


df %>% 
  group_by(iso) %>%
  mutate_at(vars(starts_with("var")), 
            funs(na.locf(na.locf(na.approx(., na.rm = FALSE, rule = 1),na.rm=FALSE),
                                                              fromLast=TRUE)))


# # A tibble: 6 x 5
# # Groups: iso [2]
# iso    year  var1   var2  var3
# <chr> <int> <dbl>  <dbl> <dbl>
# 1 XXX    2005   165  29.0   2151
# 2 XXX    2006   160  21.0   2139
# 3 XXX    2007   172  15.0   2890
# 4 XXX    2008   184   9.00  3640
# 5 XXX    2009   184   9.00  3640
# 6 YYY    2005   206 461     8049
#

Data:

df <- read.table(text=
"iso year var1 var2 var3
1 XXX 2005  165   29 2151
2 XXX 2006  160   21 2139
3 XXX 2007   NA   NA   NA
4 XXX 2008  184    9 3640
5 XXX 2009   NA   NA   NA
6 YYY 2005  206  461 8049",
header = TRUE, stringsAsFactors = FALSE)

answered Oct 06 '22 01:10

MKR

Related questions
                            
                                Extracting Information from Multi-Level Nested Lists
                            
                                Create 'dummy variables' by spreading duplicate rows into columns in R
                            
                                Using Likert Package in R for analyzing real survey data
                            
                                Two conditions for split a column
                            
                                How can I put multiple plots side-by-side in a tab panel with other outputs present, shiny r?
                            
                                Replace multiple values in a list in R
                            
                                Inner-Joining two sf objects by non sf column
                            
                                unable to set xlim and ylim using min() and max() in ggplot
                            
                                Retain list names after applying map
                            
                                From tibble to txt or excel file in R
                            
                                dplyr mutate a variable by comparing a variable and vectors of different sizes
                            
                                tidyr::expand() for a single column across groups
                            
                                accessing colors from a ggtheme theme in ggplot
                            
                                rlang: Get names from ... with colon shortcut in NSE function
                            
                                How to make a fuzzy join in R using more than one variable on each side
                            
                                Chloropleth map with geojson and ggplot2
                            
                                How to replace the certain character in certain position in the string?
                            
                                ggplotly - only return tooltip hover text on certain geom objects
                            
                                Speed up iterative loop calculation with R
                            
                                Extracting model coefficients from a nested list (list-columns)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Interpolating NA's by group using dplyr on multiple columns

Tags:

r

na

dplyr

zoo

avs

People also ask

2 Answers

Note

G. Grothendieck

MKR

Recent Activity

Donate For Us