Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify the column name of the last occurrence of a value in R data frame

Tags:

r

dplyr

I have a dataset like below with columns of 1s and 0s. I would like to add a final column that identifies the column name of the final occurrence of 0 per row.

have = data.frame(a = c(1,0,1,1,0,0,1,1,1,0),
                  b = c(1,0,1,1,1,0,1,1,0,0),
                  c = c(0,0,0,1,0,1,1,1,1,0),
                  d = c(1,0,1,1,0,0,0,1,0,1),
                  e = c(1,1,1,1,1,1,1,1,1,1))
> have
   a b c d e
1  1 1 0 1 1
2  0 0 0 0 1
3  1 1 0 1 1
4  1 1 1 1 1
5  0 1 0 0 1
6  0 0 1 0 1
7  1 1 1 0 1
8  1 1 1 1 1
9  1 0 1 0 1
10 0 0 0 1 1

I would like the output to look like this where the final column specifies the column name of the last occurring 0 and if one does not exist return NA.

> want
   a b c d e last_0
1  1 1 0 1 1      c
2  0 0 0 0 1      d
3  1 1 0 1 1      c
4  1 1 1 1 1   <NA>
5  0 1 0 0 1      d
6  0 0 1 0 1      d
7  1 1 1 0 1      d
8  1 1 1 1 1   <NA>
9  1 0 1 0 1      d
10 0 0 0 1 1      c

I've tried using max.col but it returns the last column name if a zero does not exist. Any other solutions? A dplyr solution is preferred.

> have$last_0 = names(have)[max.col(have == 0, ties.method = "last")]
> have
   a b c d e last_0
1  1 1 0 1 1      c
2  0 0 0 0 1      d
3  1 1 0 1 1      c
4  1 1 1 1 1      e
5  0 1 0 0 1      d
6  0 0 1 0 1      d
7  1 1 1 0 1      d
8  1 1 1 1 1      e
9  1 0 1 0 1      d
10 0 0 0 1 1      c

like image 605
Kate N Avatar asked Jun 09 '21 20:06

Kate N


People also ask

How do I get the column name from a Dataframe in R?

To access a specific column in a dataframe by name, you use the $ operator in the form df$name where df is the name of the dataframe, and name is the name of the column you are interested in. This operation will then return the column you want as a vector.

How do I get column names in R?

To find the column names and row names in an R data frame based on a condition, we can use row. names and colnames function. The condition for which we want to find the row names and column names can be defined inside these functions as shown in the below Examples.

How do I get the last row of data in R?

The last n rows of the data frame can be accessed by using the in-built tail() method in R. Supposedly, N is the total number of rows in the data frame, then n <=N last rows can be extracted from the structure.


1 Answers

Here is an approach with purrr::pmap:

library(dplyr);library(purrr)
have %>% 
   mutate(want = pmap_chr(cur_data(), 
                          ~ tail(c(NA,names(which(c(...)==0))),1)))
   a b c d e want
1  1 1 0 1 1    c
2  0 0 0 0 1    d
3  1 1 0 1 1    c
4  1 1 1 1 1 <NA>
5  0 1 0 0 1    d
6  0 0 1 0 1    d
7  1 1 1 0 1    d
8  1 1 1 1 1 <NA>
9  1 0 1 0 1    d
10 0 0 0 1 1    c

purrr:pmap is a very useful function because it will work row wise on data and it comes in various flavors so you can control what returns. You can refer to the entire row of data with c(...).


If you wanted to apply the procedure to only a subset of columns, you might use dplyr::select:

have %>% 
    mutate(want = pmap_chr(cur_data() %>% select(a,b,c), 
                           ~ tail(c(NA,names(which(c(...)==0))),1)))
   a b c d e want
1  1 1 0 1 1    c
2  0 0 0 0 1    c
3  1 1 0 1 1    c
4  1 1 1 1 1 <NA>
5  0 1 0 0 1    c
6  0 0 1 0 1    b
7  1 1 1 0 1 <NA>
8  1 1 1 1 1 <NA>
9  1 0 1 0 1    b
10 0 0 0 1 1    c
like image 132
Ian Campbell Avatar answered Sep 28 '22 11:09

Ian Campbell