Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Return FALSE for duplicated NA values when using the function duplicated()

just wondering why duplicated behaves the way it does with NAs:

> duplicated(c(NA,NA,NA,1,2,2))
[1] FALSE  TRUE  TRUE FALSE FALSE  TRUE

where in fact

> NA == NA
[1] NA

is there a way to achieve that duplicated marks NAs as false, like this?

> duplicated(c(NA,NA,NA,1,2,2))
[1] FALSE  FALSE  FALSE FALSE FALSE  TRUE
like image 577
jamborta Avatar asked Nov 27 '12 11:11

jamborta


People also ask

What does the duplicated function do in R?

The R function duplicated() returns a logical vector where TRUE specifies which elements of a vector or data frame are duplicates.

How does R remove duplicates?

Remove Duplicates using R Base Functions R base provides duplicated() and unique() functions to remove duplicates in an R DataFrame (data. frame), By using these two functions we can delete duplicate rows by considering all columns, single column, or selected columns.

How do I find duplicate rows in R?

We can find the rows with duplicated values in a particular column of an R data frame by using duplicated function inside the subset function. This will return only the duplicate rows based on the column we choose that means the first unique value will not be in the output.


1 Answers

You use the argument incomparables for the function duplicated like this :

> duplicated(c(NA,NA,NA,1,2,2))
[1] FALSE  TRUE  TRUE FALSE FALSE  TRUE
> duplicated(c(NA,NA,NA,1,2,2),incomparables=NA)
[1] FALSE FALSE FALSE FALSE FALSE  TRUE

It determines the values that cannot be compared (in this case NA) and returns FALSE for those values. See also ?duplicated

like image 51
Joris Meys Avatar answered Nov 17 '22 22:11

Joris Meys