Replace a subset of a data frame with dplyr join operations

Tags:

dplyr

Suppose that I gave a treatment to some column values of a data frame like this:

  id animal weight   height ...
  1    dog     23.0
  2    cat     NA
  3   duck     1.2
  4  fairy     0.2
  5  snake     BAD


df <- data.frame(id = seq(1:5),
             animal = c("dog", "cat", "duck", "fairy", "snake"),
             weight = c("23", NA, "1.2", "0.2",  "BAD"))

Suppose that the treatment require to work in a separately table, and gave as the result, the following data frame that is a subset of the original:

  id animal weight
  2    cat    2.2
  5  snake    1.3

sub_df <- data.frame(id = c(2, 5),
             animal = c("cat", "snake"),
             weight = c("2.2", "1.3"))

Now I want to put all together again, so I use an operation like this:

> df %>%
   anti_join(sub_df, by = c("id", "animal")) %>%
   bind_rows(sub_df)

 id animal weight
 4  fairy    0.2
 1    dog   23.0
 3   duck    1.2
 2    cat    2.2
 5  snake    1.3

Exist some way to do this directly with join operations?

In the case that the subset is just the key column and the variable subject to give a treatment (id, animal weigth) and not the total variables of the original data frame (id, animal, weight, height), how could assemble the subset with the original set?

896

asked Jul 05 '17 15:07

Cristóbal Alcázar

2 Answers

What you describe is a join operation in which you update some values in the original dataset. This is very easy to do with great performance using data.table because of its fast joins and update-by-reference concept (:=).

Here's an example for your toy data:

library(data.table)
setDT(df)             # convert to data.table without copy
setDT(sub_df)         # convert to data.table without copy

# join and update "df" by reference, i.e. without copy 
df[sub_df, on = c("id", "animal"), weight := i.weight]

The data is now updated:

#   id animal weight
#1:  1    dog   23.0
#2:  2    cat    2.2
#3:  3   duck    1.2
#4:  4  fairy    0.2
#5:  5  snake    1.3

You can use setDF to switch back to ordinary data.frame.

133

answered Oct 27 '22 21:10

talat

Remove the na's first, then simply stack the tibbles:

 bind_rows(filter(df,!is.na(weight)),sub_df)

answered Oct 27 '22 23:10

r.user.05apr

Related questions
                            
                                Is `if` faster than ifelse?
                            
                                Are there raw strings in R for regular expressions?
                            
                                Group by columns and summarize a column into a list
                            
                                How to Switch Between NavBar Tabs with a Button R Shiny
                            
                                How can I parse CSV data from a character vector to extract a data frame?
                            
                                How to Parse Year + Week Number in R?
                            
                                Replacing all occurrences of a pattern in a string
                            
                                Argument is of length zero
                            
                                Changing the Color of negative numbers to Red in a table generated with xtable()?
                            
                                heatmap-like plot, but for categorical variables
                            
                                Return the character associated with the specified Ascii code in R
                            
                                Set global thousand separator on knitr
                            
                                Lazy sequences in R
                            
                                Shift values in single column of dataframe up
                            
                                "subset" and "[" on dataframe give slightly different results, why?
                            
                                how to download and display an image from an URL in R?
                            
                                Dataframe within dataframe?
                            
                                How to make part of rmarkdown document without section numbering?
                            
                                R: data.table count !NA per row
                            
                                Exclude function from R package manual

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With