I have a grouped data_frame with a "tag" column taking on values "0" and "1". In each group, I need to find the first occurrence of "1" and change all the remaining occurrences to "0". Is there a way to achieve it in dplyr? For example, let's take "iris" data and let's add the extra "tag" column: <pre class="prettyprint"><code>data(iris) set.seed(1) iris$tag <- sample( c(0, 1), 150, replace = TRUE, prob = c(0.8, 0.2)) giris <- iris %>% group_by(Species) </code></pre> In "giris", in the "setosa" group I need to keep only the first occurrence of "1" (i.e. in 4th row) and set the remaining ones to "0". This seems a bit like applying a mask or something... Is there a way to do it? I have been experimenting with "which" and "duplicated" but I did not succeed. I have been thinking about filtering the "1"s only, keeping them, then joining with the remaining set, but this seems awkward, especially for a 12GB data set.

A dplyr option: <pre class="prettyprint"><code>mutate(giris, newcol = as.integer(tag & cumsum(tag) == 1)) </code></pre> Or <pre class="prettyprint"><code>mutate(giris, newcol = as.integer(tag & !duplicated(tag))) </code></pre> Or using data.table, same approach, but modify by reference: <pre class="prettyprint"><code>library(data.table) setDT(giris) giris[, newcol := as.integer(tag & cumsum(tag) == 1), by = Species] </code></pre>

We can try <pre class="prettyprint"><code>res <- giris %>% group_by(Species) %>% mutate(tag1 = ifelse(cumsum(c(TRUE,diff(tag)<0))!=1, 0, tag)) table(res[c("Species", "tag1")]) # tag1 #Species 0 1 # setosa 49 1 # versicolor 49 1 # virginica 49 1 </code></pre>

dplyr override all but the first occurrences of a value within a group

Tags:

r

dplyr

I have a grouped data_frame with a "tag" column taking on values "0" and "1". In each group, I need to find the first occurrence of "1" and change all the remaining occurrences to "0". Is there a way to achieve it in dplyr?

For example, let's take "iris" data and let's add the extra "tag" column:

data(iris)
set.seed(1)
iris$tag <- sample( c(0, 1), 150, replace = TRUE, prob = c(0.8, 0.2))
giris <- iris %>% group_by(Species)

In "giris", in the "setosa" group I need to keep only the first occurrence of "1" (i.e. in 4th row) and set the remaining ones to "0". This seems a bit like applying a mask or something...

Is there a way to do it? I have been experimenting with "which" and "duplicated" but I did not succeed. I have been thinking about filtering the "1"s only, keeping them, then joining with the remaining set, but this seems awkward, especially for a 12GB data set.

669

asked Mar 18 '16 08:03

rpl

2 Answers

A dplyr option:

mutate(giris, newcol = as.integer(tag & cumsum(tag) == 1))

mutate(giris, newcol = as.integer(tag & !duplicated(tag)))

Or using data.table, same approach, but modify by reference:

library(data.table)
setDT(giris)
giris[, newcol := as.integer(tag & cumsum(tag) == 1), by = Species]

answered Oct 17 '22 02:10

talat

We can try

res <- giris %>%
         group_by(Species) %>% 
         mutate(tag1 = ifelse(cumsum(c(TRUE,diff(tag)<0))!=1, 0, tag))

table(res[c("Species", "tag1")])
#            tag1
#Species      0  1
# setosa     49  1
# versicolor 49  1
# virginica  49  1

answered Oct 17 '22 04:10

akrun

Related questions
                            
                                pass R data as input to html?
                            
                                R extensions in C: Consult SEXP PROTECT Stack Height
                            
                                R, knitr, pander - How to nicely format summary() of dates
                            
                                ggplotly removes legend from ggplot
                            
                                ifelse statement with mutate in dplyr
                            
                                R: layout() affects margin size in plot regions
                            
                                Can I let Shiny wait for a longer time for numericInput before updating?
                            
                                Find Function Arguments without Defaults
                            
                                Changing height of strip text background in ggplot2 does not work as expected
                            
                                knitr -pandoc-citeproc error when compiling pdf output
                            
                                How can I parallelize combn()?
                            
                                R CMD check fails with "undefined exports"
                            
                                Dynamic number of sliders in Shiny
                            
                                Replacing punctuation except intra-word dashes with a space
                            
                                Emoji in R [UTF-8 encoding]
                            
                                quantile vs ecdf results
                            
                                custom split rule with partykit
                            
                                R RecordLinkage Identity
                            
                                Bubble sort using R language?
                            
                                R: Write RasterStack and preserve layer names

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With