Identify the column name of the last occurrence of a value in R data frame

Tags:

r

dplyr

I have a dataset like below with columns of 1s and 0s. I would like to add a final column that identifies the column name of the final occurrence of 0 per row.

have = data.frame(a = c(1,0,1,1,0,0,1,1,1,0),
                  b = c(1,0,1,1,1,0,1,1,0,0),
                  c = c(0,0,0,1,0,1,1,1,1,0),
                  d = c(1,0,1,1,0,0,0,1,0,1),
                  e = c(1,1,1,1,1,1,1,1,1,1))
> have
   a b c d e
1  1 1 0 1 1
2  0 0 0 0 1
3  1 1 0 1 1
4  1 1 1 1 1
5  0 1 0 0 1
6  0 0 1 0 1
7  1 1 1 0 1
8  1 1 1 1 1
9  1 0 1 0 1
10 0 0 0 1 1

I would like the output to look like this where the final column specifies the column name of the last occurring 0 and if one does not exist return NA.

> want
   a b c d e last_0
1  1 1 0 1 1      c
2  0 0 0 0 1      d
3  1 1 0 1 1      c
4  1 1 1 1 1   <NA>
5  0 1 0 0 1      d
6  0 0 1 0 1      d
7  1 1 1 0 1      d
8  1 1 1 1 1   <NA>
9  1 0 1 0 1      d
10 0 0 0 1 1      c

I've tried using max.col but it returns the last column name if a zero does not exist. Any other solutions? A dplyr solution is preferred.

> have$last_0 = names(have)[max.col(have == 0, ties.method = "last")]
> have
   a b c d e last_0
1  1 1 0 1 1      c
2  0 0 0 0 1      d
3  1 1 0 1 1      c
4  1 1 1 1 1      e
5  0 1 0 0 1      d
6  0 0 1 0 1      d
7  1 1 1 0 1      d
8  1 1 1 1 1      e
9  1 0 1 0 1      d
10 0 0 0 1 1      c

605

asked Jun 09 '21 20:06

Kate N

1 Answers

Here is an approach with purrr::pmap:

library(dplyr);library(purrr)
have %>% 
   mutate(want = pmap_chr(cur_data(), 
                          ~ tail(c(NA,names(which(c(...)==0))),1)))
   a b c d e want
1  1 1 0 1 1    c
2  0 0 0 0 1    d
3  1 1 0 1 1    c
4  1 1 1 1 1 <NA>
5  0 1 0 0 1    d
6  0 0 1 0 1    d
7  1 1 1 0 1    d
8  1 1 1 1 1 <NA>
9  1 0 1 0 1    d
10 0 0 0 1 1    c

purrr:pmap is a very useful function because it will work row wise on data and it comes in various flavors so you can control what returns. You can refer to the entire row of data with c(...).

If you wanted to apply the procedure to only a subset of columns, you might use dplyr::select:

have %>% 
    mutate(want = pmap_chr(cur_data() %>% select(a,b,c), 
                           ~ tail(c(NA,names(which(c(...)==0))),1)))
   a b c d e want
1  1 1 0 1 1    c
2  0 0 0 0 1    c
3  1 1 0 1 1    c
4  1 1 1 1 1 <NA>
5  0 1 0 0 1    c
6  0 0 1 0 1    b
7  1 1 1 0 1 <NA>
8  1 1 1 1 1 <NA>
9  1 0 1 0 1    b
10 0 0 0 1 1    c

132

answered Sep 28 '22 11:09

Ian Campbell

Related questions
                            
                                Algorithm to calculate power set (all possible subsets) of a set in R
                            
                                Handling errors before warnings in tryCatch
                            
                                Overlapping axis labels in R [duplicate]
                            
                                Bounded cumulative sum?
                            
                                How can I use multi cores processing to run glm function faster
                            
                                Is it possible to merge two time series in one?
                            
                                Timing for chunks?
                            
                                Fast escaping/deparsing of character vectors in R
                            
                                R: Pie chart with percentage as labels using ggplot2
                            
                                Evaluate expression in R data.table
                            
                                Break list into rows while preserving identifiers in r
                            
                                Extract multiple instances of a pattern from a string in R
                            
                                R regex to find two words same string, order and distance may vary
                            
                                Equivalent to cumsum for string in R [duplicate]
                            
                                R: get names of arguments passed in `...`
                            
                                r - data.table 1.10.0 - why does a named column index value not work while a integer column index value works without with = FALSE
                            
                                How to collapse sidebarPanel in shiny app?
                            
                                shiny: start the app with hidden tabs, with NO delay
                            
                                How does gganimate order an ordered bar time-series?
                            
                                How do I load an rds file into R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With