Possible Duplicate:
Standard way to remove multiple elements from a dataframe
I know in R that if you are searching for a subset of another group or matching based on id you'd use something like
subset(df1, df1$id %in% idNums1)
My question is how to do the opposite or choose items NOT matching a vector of ids.
I tried using !
but get the error message
subset(df1, df1$id !%in% idNums1)
I think my backup is to do sometime like this:
matches <- subset(df1, df1$id %in% idNums1) nonMatches <- df1[(-matches[,1]),]
but I'm hoping there's something a bit more efficient.
Remove duplicate rows in a data frameThe function distinct() [dplyr package] can be used to keep only unique/distinct rows from a data frame. If there are duplicate rows, only the first row is preserved. It's an efficient version of the R base function unique() .
No, a subset is a set, and sets do not have duplicate values.
Remove all the duplicate rows from the dataframe In this case, we just have to pass the entire dataframe as an argument in distinct() function, it then checks for all the duplicate rows for all variables/columns and removes them.
The expression df1$id %in% idNums1
produces a logical vector. To negate it, you need to negate the whole vector:
!(df1$id %in% idNums1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With