R: Interpolation of NAs by group

Tags:

I would like to perform a linear interpolation in a variable of a data frame which takes into account the: 1) time difference between the two points, 2) the moment when the data was taken and 3) the individual taken for measure the variable.

For example in the next dataframe:

 df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
            Individuals=c(1,1,1,1,1,1,1,2,2,2),
            Value=c(1, 2, 3, NA, 5, NA, 7, 5, NA, 7))
  df

I would like to obtain:

 result <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
                Individuals=c(1,1,1,1,1,1,1,2,2,2),
                Value=c(1, 2, 3, 4, 5, 6, 7, 5, 5.5, 6))
 result

I cannot use exclusively the function na.approx of the package zoo because all observations are not consecutives, some observations belong to one individual and other observations belong to other ones. The reason is because if the second individual would have its first obsrevation with NA and I would use exclusively the function na.approx, I would be using information from the individual==1 to interpolate the NA of the individual==2 (e.g the next data frame would have sucherror)

  df_2 <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
                Individuals=c(1,1,1,1,1,1,1,2,2,2),
                Value=c(1, 2, 3, NA, 5, NA, 7, NA, 5, 7))
  df_2

I have tried using the packages zoo and dplyr:

library(dplyr)
library(zoo)
proof <- df %>%
  group_by(Individuals) %>%
  na.approx(df$Value)

But I cannot perform group_by in a zoo object.

Do you know how to interpolate NA values in one variable by groups?

Thanks in advance,

705

asked Nov 13 '15 15:11

Ruben

2 Answers

Use data.frame, rather than cbind to create your data. cbind returns a matrix, but you need a data frame for dplyr. Then use na.approx inside mutate. I've commented out group_by, as you haven't provided the grouping variable in your data, but the approach should work once you've added the grouping variable to the data frame.

df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
            Individuals=c(1,1,1,1,1,1,1,2,2,2),
            Value=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10))

library(dplyr)
library(zoo)

df %>%
  group_by(Individuals) %>%
  mutate(ValueInterp = na.approx(Value, na.rm=FALSE))

   time Individuals Value ValueInterp
1     1           1    NA          NA
2     2           1     2           2
3     3           1     3           3
4     4           1    NA           4
5     5           1     5           5
6     6           1    NA           6
7     7           1     7           7
8     1           2     8           8
9     2           2    NA           9
10    3           2    10          10

Update: To interpolate multiple columns, we can use mutate_at. Here's an example with two value columns. We use mutate_at to run na.approx on all columns that include "Value" in the column name. list(interp=na.approx) tells mutate_at to generate new column names by running na.approx and adding interp as a suffix to generate the new column names:

df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
                 Individuals=c(1,1,1,1,1,1,1,2,2,2),
                 Value1=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10),
                 Value2=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10)*2)

df %>%
  group_by(Individuals) %>%
  mutate_at(vars(matches("Value")), list(interp=na.approx), na.rm=FALSE)

    time Individuals Value1 Value2 Value1_interp Value2_interp
   <dbl>       <dbl>  <dbl>  <dbl>         <dbl>         <dbl>
 1     1           1     NA     NA            NA            NA
 2     2           1      2      4             2             4
 3     3           1      3      6             3             6
 4     4           1     NA     NA             4             8
 5     5           1      5     10             5            10
 6     6           1     NA     NA             6            12
 7     7           1      7     14             7            14
 8     1           2      8     16             8            16
 9     2           2     NA     NA             9            18
10     3           2     10     20            10            20

If you don't want to preserve the original, uninterpolated columns, you can do:

df %>%
  group_by(Individuals) %>%
  mutate_at(vars(matches("Value")), na.approx, na.rm=FALSE)

answered Oct 23 '22 03:10

eipi10

We can use data.table

library(data.table)
library(zoo)
setDT(df1)[, ValueInterp:= na.approx(Value, na.rm=TRUE), by = Individual]

answered Oct 23 '22 04:10

akrun

Related questions
                            
                                Use dplyr to truncate top and bottom percentiles of a numeric variable
                            
                                ROracle not working in R studio
                            
                                How to position annotate text in the blank area of facet ggplot
                            
                                The most effective way to merge/combine two data sets by overlapping row.names and mean values
                            
                                Combining (pasting) columns
                            
                                Combine each element of a vector with another vector in R
                            
                                How to create mirrored histograms
                            
                                R language cross-product combination of two string arrays
                            
                                Convert features of a 'multifeature' GeoJSON into R spatial objects
                            
                                dplyr rename_ produces an error when renaming columns with spaces
                            
                                no output from org-babel code using R
                            
                                An explanation on the behaviour of the "==" operator
                            
                                How can I add time dimension in polar coordinates in R?
                            
                                Formatting a scale_x_continuous axis with quarterly data
                            
                                raster package: Lines around each cell
                            
                                Removing everything after a character in a column in R
                            
                                Adding hyperlinks to Shiny plots
                            
                                Why was package 'epicalc' removed from CRAN? [closed]
                            
                                dynamically assign number of splits in data.table tstrsplit
                            
                                Spread out density plots with ggplot

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R: Interpolation of NAs by group

Tags:

r

group-by

dplyr

zoo

linear-interpolation

Ruben

People also ask

2 Answers

eipi10

akrun

Recent Activity

Donate For Us