I have the below dataframe and I need to ignore the missing value for the last value for id 9.
firstlast <- data.frame(id = as.factor(c("01", "01", "01", "01", "01", "04", "04", "05", "05", "05", "05", "05", "09", "09", "09", "09", "09")),
var_a = c(13, 21, 32, 33, 44, 21, 33, 35, 17, 18, 21, 22, 17, 13, 33, 32, NA))
The df looks like:
id var_a
1 01 13
2 01 21
3 01 32
4 01 33
5 01 44
6 04 21
7 04 33
8 05 35
9 05 17
10 05 18
11 05 21
12 05 22
13 09 17
14 09 13
15 09 33
16 09 32
17 09 NA
My attempt thus far:
firstlast <- firstlast %>% group_by(id) %>%
mutate(var_first = first(var_a)) %>%
mutate(var_last = last(var_a)) %>%
mutate(change = var_last - var_first)
creates:
id var_a var_first var_last change
(fctr) (dbl) (dbl) (dbl) (dbl)
1 01 13 13 44 31
2 01 21 13 44 31
3 01 32 13 44 31
4 01 33 13 44 31
5 01 44 13 44 31
6 04 21 21 33 12
7 04 33 21 33 12
8 05 35 35 22 -13
9 05 17 35 22 -13
10 05 18 35 22 -13
11 05 21 35 22 -13
12 05 22 35 22 -13
13 09 17 17 NA NA
14 09 13 17 NA NA
15 09 33 17 NA NA
16 09 32 17 NA NA
17 09 NA 17 NA NA
However I need to ignore the missing value for the last NA observation for id 9 and instead use the second to last (or the first observed value). When I include mutate(var_last = last(var_a), na.rm=TRUE)
at that line, I then get a full column of na.rm = TRUE and not ignoring the NA value.
The end df I am hoping to achieve is...
id var_a var_first var_last change
(fctr) (dbl) (dbl) (dbl) (dbl)
1 01 13 13 44 31
2 01 21 13 44 31
3 01 32 13 44 31
4 01 33 13 44 31
5 01 44 13 44 31
6 04 21 21 33 12
7 04 33 21 33 12
8 05 35 35 22 -13
9 05 17 35 22 -13
10 05 18 35 22 -13
11 05 21 35 22 -13
12 05 22 35 22 -13
13 09 17 17 32 15
14 09 13 17 32 15
15 09 33 17 32 15
16 09 32 17 32 15
17 09 NA 17 32 15
Thank you!
To check which value in NA in an R data frame, we can use apply function along with is.na function. This will return the data frame in logical form with TRUE and FALSE.
In R, missing values are represented by the symbol NA (not available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number). Unlike SAS, R uses the same symbol for character and numeric data.
mutate
doesn't have an na.rm
option, neither does first()
or last()
which would be more appropriate here. You can omit missing values yourself with na.omit
:
firstlast <- firstlast %>% group_by(id) %>%
mutate(
var_first = first(na.omit(var_a)),
var_last = last(na.omit(var_a)),
change = var_last - var_first
)
# Source: local data frame [17 x 5]
# Groups: id [4]
#
# id var_a var_first var_last change
# (fctr) (dbl) (dbl) (dbl) (dbl)
# 1 01 13 13 44 31
# 2 01 21 13 44 31
# 3 01 32 13 44 31
# 4 01 33 13 44 31
# 5 01 44 13 44 31
# 6 04 21 21 33 12
# 7 04 33 21 33 12
# 8 05 35 35 22 -13
# 9 05 17 35 22 -13
# 10 05 18 35 22 -13
# 11 05 21 35 22 -13
# 12 05 22 35 22 -13
# 13 09 17 17 32 15
# 14 09 13 17 32 15
# 15 09 33 17 32 15
# 16 09 32 17 32 15
# 17 09 NA 17 32 15
As a sidenote, giving mutate
an na.rm
argument doesn't make much sense. What NA
values would you omit? Any rows from any columns used in the calculation? Other inputs? Just the result? How would you fill the removed rows to still have the same values? It's much better to be explicit as above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With