I am looking for a way to extract the first and last non-NA value from each group. I am using dplyr::first() and dplyr::last(), but I can´t work out how to choose the first or last non-NA value.
library(dplyr)
set.seed(123)
d <- data.frame(
group = rep(1:3, each = 3),
year = rep(seq(2000,2002,1),3),
value = sample(1:9, r = T))
#Introduce NA values in first row of group 2 and last row of group 3
d %>%
mutate(
value = case_when(
group == 2 & year ==2000 ~ NA_integer_,
group == 3 & year ==2002 ~ NA_integer_,
TRUE ~ value))%>%
group_by(group) %>%
mutate(
first = dplyr::first(value),
last = dplyr::last(value))
RESULT (with issue)
# A tibble: 9 x 5
# Groups: group [3]
group year value first last
<int> <dbl> <int> <int> <int>
1 1 2000 3 3 4
2 1 2001 8 3 4
3 1 2002 4 3 4
4 2 2000 NA NA 1
5 2 2001 9 NA 1
6 2 2002 1 NA 1
7 3 2000 5 5 NA
8 3 2001 9 5 NA
9 3 2002 NA 5 NA
Can you help me make the values in the "first" column for group 2 = 9 and the values in the "last" column from group 3 = 9?
I very much prefer a tidyverse solution if one such exists?
The classic way to replace NA's in R is by using the IS.NA() function. The IS.NA() function takes a vector or data frame as input and returns a logical object that indicates whether a value is missing (TRUE or VALUE). Next, you can use this logical object to create a subset of the missing values and assign them a zero.
Get First/Last Non-NaN Values per row The first solution to get the non-NaN values per row from a list of columns use the next steps: .fillna (method='bfill', axis=1) - to fill all non-NaN values from the last to the first one; axis=1 - means columns and the result Series will have all non-null values per given row:
It is useful if you want to convert an annoying value to NA. A modified version of x that replaces any values that are equal to y with NA. coalesce () to replace missing values with a specified value. tidyr::replace_na () to replace NA with a value. recode () to more generally replace values.
Now, we can use the na_if function to replace a certain value of our example vector with NA: As you can see based on the previous R code and the output of the RStudio console, we replaced the value 5 of our vector with NA.
which.max () method returns the first argument that is encountered within the vector with non-na value. The method has the following syntax in R : [1] "Original Vector" [1] NA 1 3 NA 2 NA 5 7 [1] "First non-na index" [1] 2
Use na.omit
, compare:
first(c(NA, 11, 22))
# [1] NA
first(na.omit(c(NA, 11, 22)))
# [1] 11
Using example data:
d %>%
mutate(
value = case_when(
group == 2 & year ==2000 ~ NA_integer_,
group == 3 & year ==2002 ~ NA_integer_,
TRUE ~ value))%>%
group_by(group) %>%
mutate(
first = dplyr::first(na.omit(value)),
last = dplyr::last(na.omit(value)))
# # A tibble: 9 x 5
# # Groups: group [3]
# group year value first last
# <int> <dbl> <int> <int> <int>
# 1 1 2000 3 3 4
# 2 1 2001 8 3 4
# 3 1 2002 4 3 4
# 4 2 2000 NA 9 1
# 5 2 2001 9 9 1
# 6 2 2002 1 9 1
# 7 3 2000 5 5 9
# 8 3 2001 9 5 9
# 9 3 2002 NA 5 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With