I'm still working through the lessons on DataCamp for R, so please forgive me if this question seems naïve.
Consider the following (very contrived) sample:
library(dplyr)
library(tibble)
type <- c("Dog", "Cat", "Cat", "Cat")
name <- c("Ella", "Arrow", "Gabby", "Eddie")
pets = tibble(name, type)
name <- c("Ella", "Arrow", "Dog")
type <- c("Dog", "Cat", "Calvin")
favorites = tibble(name, type)
anti_join(favorites, pets, by = "name")
setdiff(favorites, pets, by = "name")
Both of these return exactly the same data:
> anti_join(favorites, pets, by = "name")
# A tibble: 1 × 2
name type
<chr> <chr>
1 Dog Calvin
> setdiff(favorites, pets, by = "name")
# A tibble: 1 × 2
name type
<chr> <chr>
1 Dog Calvin
The documentation for each of them seems to indicate only a subtle difference: that setdiff
returns rows, but anti_join
does not. From my testing, this doesn't appear to be the case.
Can someone explain to me the true differences between these two, and perhaps provide a better example that illustrates the differences more clearly? (This is an area where DataCamp hasn't been particularly helpful.)
Both subset the first parameter, but setdiff
requires the columns to be the same:
library(dplyr)
setdiff(mtcars, mtcars[1:30, ])
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 15.0 8 301 335 3.54 3.57 14.6 0 1 5 8
#> 2 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2
setdiff(mtcars, mtcars[1:30, 1:6])
#> Error in setdiff_data_frame(x, y): not compatible: Cols in x but not y: `carb`, `gear`, `am`, `vs`, `qsec`.
whereas anti_join
is a join, so doesn't:
anti_join(mtcars, mtcars[1:30, 1:3])
#> Joining, by = c("mpg", "cyl", "disp")
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 15.0 8 301 335 3.54 3.57 14.6 0 1 5 8
#> 2 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With