just wondering why duplicated behaves the way it does with NAs:
> duplicated(c(NA,NA,NA,1,2,2))
[1] FALSE TRUE TRUE FALSE FALSE TRUE
where in fact
> NA == NA
[1] NA
is there a way to achieve that duplicated marks NAs as false, like this?
> duplicated(c(NA,NA,NA,1,2,2))
[1] FALSE FALSE FALSE FALSE FALSE TRUE
The R function duplicated() returns a logical vector where TRUE specifies which elements of a vector or data frame are duplicates.
Remove Duplicates using R Base Functions R base provides duplicated() and unique() functions to remove duplicates in an R DataFrame (data. frame), By using these two functions we can delete duplicate rows by considering all columns, single column, or selected columns.
We can find the rows with duplicated values in a particular column of an R data frame by using duplicated function inside the subset function. This will return only the duplicate rows based on the column we choose that means the first unique value will not be in the output.
You use the argument incomparables
for the function duplicated
like this :
> duplicated(c(NA,NA,NA,1,2,2))
[1] FALSE TRUE TRUE FALSE FALSE TRUE
> duplicated(c(NA,NA,NA,1,2,2),incomparables=NA)
[1] FALSE FALSE FALSE FALSE FALSE TRUE
It determines the values that cannot be compared (in this case NA
) and returns FALSE
for those values. See also ?duplicated
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With