I already made a similar question but now I want just to restrict the new values of NA.
I have some data like this:
Date 1 Date 2 Date 3 Date 4 Date 5 Date 6
A NA 0.1 0.2 NA 0.3 0.2
B 0.1 NA NA 0.3 0.2 0.1
C NA NA NA NA 0.3 NA
D 0.1 0.2 0.3 NA 0.1 NA
E NA NA 0.1 0.2 0.1 0.3
I would like to change the NA values of my data based on the first date a value is registered. So for example for A, the first registration is Date 2. Then I want that before that registration the values of NA in A are 0, and after the first registration the values of NA become the mean of the nearest values (mean of date 3 and 5).
In case the last value is an NA, transform it into the last registered value (as in C and D). In the case of E all NA values will become 0.
Get something like this:
Date 1 Date 2 Date 3 Date 4 Date 5 Date 6
A 0 0.1 0.2 0.25 0.3 0.2
B 0.1 0.2 0.2 0.3 0.2 0.1
C 0 0 0 0 0.3 0.3
D 0.1 0.2 0.3 0.2 0.1 0.1
E 0 0 0.1 0.2 0.1 0.3
Can you help me? I'm not sure how to do it in R.
Using R replace() function to update 0 with NA R has a built-in function called replace() that replaces values in a vector with another value, for example, zeros with NAs.
To test if a value is NA, use is.na(). The function is.na(x) returns a logical vector of the same size as x with value TRUE if and only if the corresponding element in x is NA. NaN means Not A Number, and is for (IEEE) arithmetic purposes. Usually NaN comes from 0/0.
fillna() method is used to fill NaN/NA values on a specified column or on an entire DataaFrame with any given value. You can specify modify using inplace, or limit how many filling to perform or choose an axis whether to fill on rows/column etc.
Here is a way using na.approx
from the zoo
package and apply
with MARGIN = 1
(so this is probably not very efficient but get's the job done).
library(zoo)
df1 <- as.data.frame(t(apply(dat, 1, na.approx, method = "constant", f = .5, na.rm = FALSE)))
This results in
df1
# V1 V2 V3 V4 V5
#A NA 0.1 0.2 0.25 0.3
#B 0.1 0.2 0.2 0.30 0.2
#C NA NA NA NA 0.3
#E NA NA 0.1 0.20 0.1
Replace NA
s and rename columns.
df1[is.na(df1)] <- 0
names(df1) <- names(dat)
df1
# Date_1 Date_2 Date_3 Date_4 Date_5
#A 0.0 0.1 0.2 0.25 0.3
#B 0.1 0.2 0.2 0.30 0.2
#C 0.0 0.0 0.0 0.00 0.3
#E 0.0 0.0 0.1 0.20 0.1
explanation
Given a vector
x <- c(0.1, NA, NA, 0.3, 0.2)
na.approx(x)
returns x
with linear interpolated values
#[1] 0.1000000 0.1666667 0.2333333 0.3000000 0.2000000
But OP asked for constant values so we need the argument method = "constant"
from the approx
function.
na.approx(x, method = "constant")
# [1] 0.1 0.1 0.1 0.3 0.2
But this is still not what OP asked for because it carries the last observation forward while you want the mean for the closest non-NA
values. Therefore we need the argument f
(also from approx
)
na.approx(x, method = "constant", f = .5)
# [1] 0.1 0.2 0.2 0.3 0.2 # looks good
From ?approx
f : for method = "constant" a number between 0 and 1 inclusive, indicating a compromise between left- and right-continuous step functions. If y0 and y1 are the values to the left and right of the point then the value is y0 if f == 0, y1 if f == 1, and y0*(1-f)+y1*f for intermediate values. In this way the result is right-continuous for f == 0 and left-continuous for f == 1, even for non-finite y values.
Lastly, if we don't want to replace the NA
s at the beginning and end of each row we need na.rm = FALSE
.
From ?na.approx
na.rm : logical. If the result of the (spline) interpolation still results in NAs, should these be removed?
data
dat <- structure(list(Date_1 = c(NA, 0.1, NA, NA), Date_2 = c(0.1, NA,
NA, NA), Date_3 = c(0.2, NA, NA, 0.1), Date_4 = c(NA, 0.3, NA,
0.2), Date_5 = c(0.3, 0.2, 0.3, 0.1)), .Names = c("Date_1", "Date_2",
"Date_3", "Date_4", "Date_5"), class = "data.frame", row.names = c("A",
"B", "C", "E"))
EDIT
If there are NA
s in the last column we can replace these with the last non-NA
s before we apply na.approx
as shown above.
dat$Date_6[is.na(dat$Date_6)] <- dat[cbind(1:nrow(dat),
max.col(!is.na(dat), ties.method = "last"))][is.na(dat$Date_6)]
This is another possible answer, using na.locf
from the zoo
package.
Edit: apply
is actually not required; This solution fills in the last observed value if this value is missing.
# create the dataframe
Date1 <- c(NA,.1,NA,NA)
Date2 <- c(.1, NA,NA,NA)
Date3 <- c(.2,NA,NA,.1)
Date4 <- c(NA,.3,NA,.2)
Date5 <- c(.3,.2,.3,.1)
Date6 <- c(.1,NA,NA,NA)
df <- as.data.frame(cbind(Date1,Date2,Date3,Date4,Date5,Date6))
rownames(df) <- c('A','B','C','D')
> df
Date1 Date2 Date3 Date4 Date5 Date6
A NA 0.1 0.2 NA 0.3 0.1
B 0.1 NA NA 0.3 0.2 NA
C NA NA NA NA 0.3 NA
D NA NA 0.1 0.2 0.1 NA
# Load library
library(zoo)
df2 <- t(na.locf(t(df),na.rm = F)) # fill last observation carried forward
df3 <- t(na.locf(t(df),na.rm = F, fromLast = T)) # last obs carried backward
df4 <- (df2 + df3)/2 # mean of both dataframes
df4 <- t(na.locf(t(df4),na.rm = F)) # fill last observation carried forward
df4[is.na(df4)] <- 0 # NA values are 0
Date1 Date2 Date3 Date4 Date5 Date6
A 0.0 0.1 0.2 0.25 0.3 0.1
B 0.1 0.2 0.2 0.30 0.2 0.2
C 0.0 0.0 0.0 0.00 0.3 0.3
D 0.0 0.0 0.1 0.20 0.1 0.1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With