Let's say I have two data.frames like so: <pre class="prettyprint"><code>bad_ids = read.table(text="id n 123 3", header = T) dat <- read.table(text="id n partner_id 123 3 555 123 3 345 123 3 092 245 1 438 888 1 333", header=T) </code></pre> I want to identify all the rows in <code>dat</code> that match the id column in <code>bad_ids.</code> I then want to create a "flag" variable that is set to 1 for all but the first match. The resulting data.frame would look like: <pre class="prettyprint"><code>dat <- read.table(text="id n partner_id flag 123 3 555 0 123 3 345 1 123 3 092 1 245 1 438 0 888 1 333 0", header=T) </code></pre> Notice that the first row of <code>123</code> has a flag of <code>0</code>. I want to flag all but the first match. My strategy for emulating this behavior was something like the following: <pre class="prettyprint"><code># Flag the Duplicate Rows dat %>% filter(id %in% bad_ids$id) %>% slice(-1) %>% # delete the first row mutate(flag = 1) #create the id on all but the first match %>% unfilter() # this is the function I want to go back to the original, unfiltered dataset </code></pre> I'm wondering if there's some equivalent of "unfilter" that allows me to re-merge with the original dataset?

One option is to create the 'flag' as a logical vector with <code>%in%</code> by comparing the 'bad_ids' 'id' column, then grouped by 'id', change the 'flag' by creating another condition with <code>row_number()</code> <pre class="prettyprint"><code>library(dplyr) dat %>% mutate(flag = id %in% bad_ids$id) %>% group_by(id) %>% mutate(flag = +(row_number() > 1 & flag)) #or use `duplicated` # mutate(flag = +(duplicated(flag) & flag)) # A tibble: 5 x 4 # Groups: id [3] # id n partner_id flag # <int> <int> <int> <int> #1 123 3 555 0 #2 123 3 345 1 #3 123 3 92 1 #4 245 1 438 0 #5 888 1 333 0 </code></pre> <hr> Also, if we use the approach from the OP's code, an option is to join and then replace the <code>NA</code> with 0 <pre class="prettyprint"><code>dat %>% filter(id %in% bad_ids$id) %>% slice(-1) %>% mutate(flag = 1) %>% right_join(dat) %>% mutate(flag = replace_na(flag, 0)) </code></pre>

Is there an "unfilter" in dplyr to merge changes with the original dataset?

Tags:

r

dplyr

Let's say I have two data.frames like so:

bad_ids = read.table(text="id n
123 3", header = T)

dat <- read.table(text="id n partner_id
123 3 555
123 3 345
123 3 092
245 1 438
888 1 333", header=T)

I want to identify all the rows in dat that match the id column in bad_ids. I then want to create a "flag" variable that is set to 1 for all but the first match. The resulting data.frame would look like:

dat <- read.table(text="id n partner_id flag 
123 3 555 0
123 3 345 1
123 3 092 1
245 1 438 0
888 1 333 0", header=T)

Notice that the first row of 123 has a flag of 0. I want to flag all but the first match.

My strategy for emulating this behavior was something like the following:

# Flag the Duplicate Rows
dat %>% 
  filter(id %in% bad_ids$id) %>%
  slice(-1) %>% # delete the first row
  mutate(flag = 1) #create the id on all but the first match %>%
  unfilter() # this is the function I want to go back to the original, unfiltered dataset

I'm wondering if there's some equivalent of "unfilter" that allows me to re-merge with the original dataset?

342

asked Nov 12 '19 20:11

Parseltongue

1 Answers

One option is to create the 'flag' as a logical vector with %in% by comparing the 'bad_ids' 'id' column, then grouped by 'id', change the 'flag' by creating another condition with row_number()

library(dplyr)
dat %>% 
   mutate(flag = id %in% bad_ids$id) %>% 
   group_by(id) %>% 
   mutate(flag = +(row_number() > 1 & flag))
   #or use `duplicated`
   # mutate(flag = +(duplicated(flag) & flag))
# A tibble: 5 x 4
# Groups:   id [3]
#     id     n partner_id  flag
#  <int> <int>      <int> <int>
#1   123     3        555     0
#2   123     3        345     1
#3   123     3         92     1
#4   245     1        438     0
#5   888     1        333     0

Also, if we use the approach from the OP's code, an option is to join and then replace the NA with 0

dat %>% 
  filter(id %in% bad_ids$id) %>%
  slice(-1) %>%
  mutate(flag = 1) %>% 
  right_join(dat) %>% 
  mutate(flag = replace_na(flag, 0))

144

answered Oct 11 '22 19:10

akrun

Related questions
                            
                                rmarkdown::render problem when called from a package
                            
                                Get PID for subprocesses for asynchronous futures in R shiny
                            
                                Change axis text direction to right-to-left
                            
                                CRAN-acceptable way of linking to OpenMP some C code called from Rcpp
                            
                                Add a particular version of R to a docker container
                            
                                image as axis tick ggplot
                            
                                How to find the joint cumulative distribution function from a 2-D copula in R?
                            
                                Discrepancies between R optim vs Scipy optimize: Nelder-Mead
                            
                                Transparent lookup table for numeric values without using data.frame?
                            
                                Conditionally include chapters in Bookdown
                            
                                Adding a horizontal line to a plotly bar graph
                            
                                How to update the leaflet map in the selectModUI in a Shiny app?
                            
                                Splitting strings into number and string (with missings)
                            
                                fitting first order equation with nlme and lsoda
                            
                                How can I make this histogram in ggplot / R?
                            
                                compact/efficient replacement for diag(X V X^T)?
                            
                                Unexpected behaviour when piping to ls() in R?
                            
                                left_join for tbl: na_matches not working
                            
                                numbering characters in a string
                            
                                Port JavaScript async and igraph code to R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With