I have a dataset like below with columns of 1s and 0s. I would like to add a final column that identifies the column name of the final occurrence of 0 per row.
have = data.frame(a = c(1,0,1,1,0,0,1,1,1,0),
b = c(1,0,1,1,1,0,1,1,0,0),
c = c(0,0,0,1,0,1,1,1,1,0),
d = c(1,0,1,1,0,0,0,1,0,1),
e = c(1,1,1,1,1,1,1,1,1,1))
> have
a b c d e
1 1 1 0 1 1
2 0 0 0 0 1
3 1 1 0 1 1
4 1 1 1 1 1
5 0 1 0 0 1
6 0 0 1 0 1
7 1 1 1 0 1
8 1 1 1 1 1
9 1 0 1 0 1
10 0 0 0 1 1
I would like the output to look like this where the final column specifies the column name of the last occurring 0 and if one does not exist return NA.
> want
a b c d e last_0
1 1 1 0 1 1 c
2 0 0 0 0 1 d
3 1 1 0 1 1 c
4 1 1 1 1 1 <NA>
5 0 1 0 0 1 d
6 0 0 1 0 1 d
7 1 1 1 0 1 d
8 1 1 1 1 1 <NA>
9 1 0 1 0 1 d
10 0 0 0 1 1 c
I've tried using max.col but it returns the last column name if a zero does not exist. Any other solutions? A dplyr solution is preferred.
> have$last_0 = names(have)[max.col(have == 0, ties.method = "last")]
> have
a b c d e last_0
1 1 1 0 1 1 c
2 0 0 0 0 1 d
3 1 1 0 1 1 c
4 1 1 1 1 1 e
5 0 1 0 0 1 d
6 0 0 1 0 1 d
7 1 1 1 0 1 d
8 1 1 1 1 1 e
9 1 0 1 0 1 d
10 0 0 0 1 1 c
To access a specific column in a dataframe by name, you use the $ operator in the form df$name where df is the name of the dataframe, and name is the name of the column you are interested in. This operation will then return the column you want as a vector.
To find the column names and row names in an R data frame based on a condition, we can use row. names and colnames function. The condition for which we want to find the row names and column names can be defined inside these functions as shown in the below Examples.
The last n rows of the data frame can be accessed by using the in-built tail() method in R. Supposedly, N is the total number of rows in the data frame, then n <=N last rows can be extracted from the structure.
Here is an approach with purrr::pmap
:
library(dplyr);library(purrr)
have %>%
mutate(want = pmap_chr(cur_data(),
~ tail(c(NA,names(which(c(...)==0))),1)))
a b c d e want
1 1 1 0 1 1 c
2 0 0 0 0 1 d
3 1 1 0 1 1 c
4 1 1 1 1 1 <NA>
5 0 1 0 0 1 d
6 0 0 1 0 1 d
7 1 1 1 0 1 d
8 1 1 1 1 1 <NA>
9 1 0 1 0 1 d
10 0 0 0 1 1 c
purrr:pmap
is a very useful function because it will work row wise on data and it comes in various flavors so you can control what returns. You can refer to the entire row of data with c(...)
.
If you wanted to apply the procedure to only a subset of columns, you might use dplyr::select
:
have %>%
mutate(want = pmap_chr(cur_data() %>% select(a,b,c),
~ tail(c(NA,names(which(c(...)==0))),1)))
a b c d e want
1 1 1 0 1 1 c
2 0 0 0 0 1 c
3 1 1 0 1 1 c
4 1 1 1 1 1 <NA>
5 0 1 0 0 1 c
6 0 0 1 0 1 b
7 1 1 1 0 1 <NA>
8 1 1 1 1 1 <NA>
9 1 0 1 0 1 b
10 0 0 0 1 1 c
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With