Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transform NA values based on first registration and nearest values

Tags:

r

na

I already made a similar question but now I want just to restrict the new values of NA.

I have some data like this:

Date 1   Date 2    Date 3    Date 4    Date 5   Date 6
A  NA       0.1       0.2       NA        0.3    0.2
B  0.1      NA        NA        0.3       0.2    0.1
C  NA       NA        NA        NA        0.3    NA
D  0.1      0.2       0.3       NA        0.1    NA
E  NA       NA        0.1       0.2       0.1    0.3

I would like to change the NA values of my data based on the first date a value is registered. So for example for A, the first registration is Date 2. Then I want that before that registration the values of NA in A are 0, and after the first registration the values of NA become the mean of the nearest values (mean of date 3 and 5).

In case the last value is an NA, transform it into the last registered value (as in C and D). In the case of E all NA values will become 0.

Get something like this:

Date 1   Date 2    Date 3    Date 4    Date 5   Date 6 
A  0       0.1       0.2        0.25      0.3    0.2
B  0.1     0.2       0.2        0.3       0.2    0.1
C  0       0         0          0         0.3    0.3
D  0.1     0.2       0.3        0.2       0.1    0.1
E  0       0         0.1        0.2       0.1    0.3

Can you help me? I'm not sure how to do it in R.

like image 872
user195366 Avatar asked Jan 10 '19 14:01

user195366


People also ask

How do I replace specific values with NA in R?

Using R replace() function to update 0 with NA R has a built-in function called replace() that replaces values in a vector with another value, for example, zeros with NAs.

How do you know if a value is na?

To test if a value is NA, use is.na(). The function is.na(x) returns a logical vector of the same size as x with value TRUE if and only if the corresponding element in x is NA. NaN means Not A Number, and is for (IEEE) arithmetic purposes. Usually NaN comes from 0/0.

How do I fill a column with Na?

fillna() method is used to fill NaN/NA values on a specified column or on an entire DataaFrame with any given value. You can specify modify using inplace, or limit how many filling to perform or choose an axis whether to fill on rows/column etc.


2 Answers

Here is a way using na.approx from the zoo package and apply with MARGIN = 1 (so this is probably not very efficient but get's the job done).

library(zoo)
df1 <- as.data.frame(t(apply(dat, 1, na.approx, method = "constant", f = .5, na.rm = FALSE)))

This results in

df1
#   V1  V2  V3   V4  V5
#A  NA 0.1 0.2 0.25 0.3
#B 0.1 0.2 0.2 0.30 0.2
#C  NA  NA  NA   NA 0.3
#E  NA  NA 0.1 0.20 0.1

Replace NAs and rename columns.

df1[is.na(df1)] <- 0
names(df1) <- names(dat)
df1
#  Date_1 Date_2 Date_3 Date_4 Date_5
#A    0.0    0.1    0.2   0.25    0.3
#B    0.1    0.2    0.2   0.30    0.2
#C    0.0    0.0    0.0   0.00    0.3
#E    0.0    0.0    0.1   0.20    0.1

explanation

Given a vector

x <- c(0.1, NA, NA, 0.3, 0.2)
na.approx(x)

returns x with linear interpolated values

#[1] 0.1000000 0.1666667 0.2333333 0.3000000 0.2000000

But OP asked for constant values so we need the argument method = "constant" from the approx function.

na.approx(x, method = "constant") 
# [1] 0.1 0.1 0.1 0.3 0.2

But this is still not what OP asked for because it carries the last observation forward while you want the mean for the closest non-NA values. Therefore we need the argument f (also from approx)

na.approx(x, method = "constant", f = .5)
# [1] 0.1 0.2 0.2 0.3 0.2 # looks good

From ?approx

f : for method = "constant" a number between 0 and 1 inclusive, indicating a compromise between left- and right-continuous step functions. If y0 and y1 are the values to the left and right of the point then the value is y0 if f == 0, y1 if f == 1, and y0*(1-f)+y1*f for intermediate values. In this way the result is right-continuous for f == 0 and left-continuous for f == 1, even for non-finite y values.

Lastly, if we don't want to replace the NAs at the beginning and end of each row we need na.rm = FALSE.

From ?na.approx

na.rm : logical. If the result of the (spline) interpolation still results in NAs, should these be removed?

data

dat <- structure(list(Date_1 = c(NA, 0.1, NA, NA), Date_2 = c(0.1, NA, 
NA, NA), Date_3 = c(0.2, NA, NA, 0.1), Date_4 = c(NA, 0.3, NA, 
0.2), Date_5 = c(0.3, 0.2, 0.3, 0.1)), .Names = c("Date_1", "Date_2", 
"Date_3", "Date_4", "Date_5"), class = "data.frame", row.names = c("A", 
"B", "C", "E"))

EDIT

If there are NAs in the last column we can replace these with the last non-NAs before we apply na.approx as shown above.

dat$Date_6[is.na(dat$Date_6)] <- dat[cbind(1:nrow(dat),
                                           max.col(!is.na(dat), ties.method = "last"))][is.na(dat$Date_6)]
like image 187
markus Avatar answered Oct 20 '22 09:10

markus


This is another possible answer, using na.locf from the zoo package. Edit: apply is actually not required; This solution fills in the last observed value if this value is missing.

# create the dataframe
Date1 <- c(NA,.1,NA,NA)
Date2 <- c(.1, NA,NA,NA)
Date3 <- c(.2,NA,NA,.1)
Date4 <- c(NA,.3,NA,.2)
Date5 <- c(.3,.2,.3,.1)
Date6 <- c(.1,NA,NA,NA)
df <- as.data.frame(cbind(Date1,Date2,Date3,Date4,Date5,Date6))
rownames(df) <- c('A','B','C','D')

> df
  Date1 Date2 Date3 Date4 Date5 Date6
A    NA   0.1   0.2    NA   0.3   0.1
B   0.1    NA    NA   0.3   0.2    NA
C    NA    NA    NA    NA   0.3    NA
D    NA    NA   0.1   0.2   0.1    NA



# Load library
library(zoo)
df2 <- t(na.locf(t(df),na.rm = F)) # fill last observation carried forward
df3 <- t(na.locf(t(df),na.rm = F, fromLast = T)) # last obs carried backward

df4 <- (df2 + df3)/2 # mean of both dataframes

df4 <- t(na.locf(t(df4),na.rm = F)) # fill last observation carried forward
df4[is.na(df4)] <- 0 # NA values are 0

  Date1 Date2 Date3 Date4 Date5 Date6
A   0.0   0.1   0.2  0.25   0.3   0.1
B   0.1   0.2   0.2  0.30   0.2   0.2
C   0.0   0.0   0.0  0.00   0.3   0.3
D   0.0   0.0   0.1  0.20   0.1   0.1
like image 4
Niek Avatar answered Oct 20 '22 10:10

Niek