Extracting unique rows from a data table in R [duplicate]

People also ask

How do I extract unique rows in R?

To extract the unique rows of a data frame in R, use the unique() function and pass the data frame as an argument and the method returns unique rows.

How do I get unique values from a column with repeated values in R?

To find unique values in a column in a data frame, use the unique() function in R. In Exploratory Data Analysis, the unique() function is crucial since it detects and eliminates duplicate values in the data.

How do I find duplicate rows in a Dataframe in R?

We can find the rows with duplicated values in a particular column of an R data frame by using duplicated function inside the subset function. This will return only the duplicate rows based on the column we choose that means the first unique value will not be in the output.

Before data.table v1.9.8, the default behavior of unique.data.table method was to use the keys in order to determine the columns by which the unique combinations should be returned. If the key was NULL (the default), one would get the original data set back (as in OPs situation).

As of data.table 1.9.8+, unique.data.table method uses all columns by default which is consistent with the unique.data.frame in base R. To have it use the key columns, explicitly pass by = key(DT) into unique (replacing DT in the call to key with the name of the data.table).

Hence, old behavior would be something like

library(data.table) v1.9.7-
set.seed(123)
a <- as.data.frame(matrix(sample(2, 120, replace = TRUE), ncol = 3))
b <- data.table(a, key = names(a))
## key(b)
## [1] "V1" "V2" "V3"
dim(unique(b)) 
## [1] 8 3

While for data.table v1.9.8+, just

b <- data.table(a) 
dim(unique(b)) 
## [1] 8 3
## or dim(unique(b, by = key(b)) # in case you have keys you want to use them

Or without a copy

setDT(a)
dim(unique(a))
## [1] 8 3

As mentioned by Seth the data.table package has evolved and now proposes optimized functions for this.

To all the ones who don't want to get into the documentation, here is the fastest and most memory efficient way to do what you want :

uniqueN(a)

And if you only want to choose a subset of columns you could use the 'by' argument :

uniqueN(a,by = c('V1','V2'))

EDIT : As mentioned in the comments this will only gives the count of unique rows. To get the unique values, use unique instead :

unique(a)

And for a subset :

unique(a[c('V1',"V2")], by=c('V1','V2'))

Related questions
                            
                                Installing R Packages Error in readRDS(file) : error reading from connection
                            
                                World map with ggmap
                            
                                how to convert data.frame to transactions for arules
                            
                                Generally disable dimension dropping for matrices?
                            
                                how to suppress output when using `:=` in R {data.table}, prior to v1.8.3?
                            
                                How do I make sure that a shiny reactive plot only changes once all other reactives finish changing?
                            
                                How do I add a Changelog or NEWS file to my R package?
                            
                                Can't draw Histogram, 'x' must be numeric
                            
                                How do I select rows by two criteria in data.table in R
                            
                                How to initialize empty data frame (lot of columns at the same time) in R
                            
                                Significance level added to matrix correlation heatmap using ggplot2
                            
                                Using enter key with action button in R Shiny
                            
                                Generate all possible permutations (or n-tuples)
                            
                                plotting a curve around a set of points
                            
                                R save table as image
                            
                                Handling Latex backslashes in xtable
                            
                                The R %*% operator
                            
                                Error in lis[[i]] : attempt to select less than one element
                            
                                What does the dplyr period character "." reference?
                            
                                R syntax highlighting in Terminal

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Extracting unique rows from a data table in R [duplicate]

Tags:

r

data.table

People also ask

Recent Activity

Donate For Us