How can I remove all duplicates so that NONE are left in a data frame?

Tags:

There is a similar question for PHP, but I'm working with R and am unable to translate the solution to my problem.

I have this data frame with 10 rows and 50 columns, where some of the rows are absolutely identical. If I use unique on it, I get one row per - let's say - "type", but what I actually want is to get only those rows which only appear once. Does anyone know how I can achieve this?

I can have a look at clusters and heatmaps to sort it out manually, but I have bigger data frames than the one mentioned above (with up to 100 rows) where this gets a bit tricky.

578

asked Dec 07 '12 12:12

Lilith-Elina

3 Answers

This will extract the rows which appear only once (assuming your data frame is named df):

df[!(duplicated(df) | duplicated(df, fromLast = TRUE)), ]

How it works: The function duplicated tests whether a line appears at least for the second time starting at line one. If the argument fromLast = TRUE is used, the function starts at the last line.

Boths boolean results are combined with | (logical 'or') into a new vector which indicates all lines appearing more than once. The result of this is negated using ! thereby creating a boolean vector indicating lines appearing only once.

answered Oct 20 '22 08:10

Sven Hohenstein

A possibility involving dplyr could be:

df %>%
 group_by_all() %>%
 filter(n() == 1)

Or:

df %>%
 group_by_all() %>%
 filter(!any(row_number() > 1))

Since dplyr 1.0.0, the preferable way would be:

data %>%
    group_by(across(everything())) %>%
    filter(n() == 1)

answered Oct 20 '22 07:10

tmfmnk

Try it

library(dplyr)

DF1 <- data.frame(Part = c(1,2,3,4,5), Age = c(23,34,23,25,24),  B.P = c(87,76,75,75,78))

DF2 <- data.frame(Part =c(3,5), Age = c(23,24), B.P = c(75,78))

DF3 <- rbind(DF1,DF2)

DF3 <- DF3[!(duplicated(DF3) | duplicated(DF3, fromLast = TRUE)), ]

answered Oct 20 '22 08:10

Brutalroot

Related questions
                            
                                How to sort files list by date?
                            
                                Is there a faster lm function
                            
                                dplyr: inner_join with a partial string match
                            
                                Skip specific rows using read.csv in R [duplicate]
                            
                                Dividing columns by colSums in R
                            
                                Is set.seed consistent over different versions of R (and Ubuntu)?
                            
                                Clustering list for hclust function
                            
                                min for each row in a data frame
                            
                                Installing nloptr on Linux
                            
                                Concatenate strings and expressions in a plot's title
                            
                                completely uninstall r linux
                            
                                Quickly remove zero variance variables from a data.frame
                            
                                Split date-time column into Date and time variables
                            
                                Removal of constant columns in R
                            
                                Cumulative sum for positive numbers only [duplicate]
                            
                                Nested facets in ggplot2 spanning groups
                            
                                python equivalent of qnorm, qf and qchi2 of R
                            
                                Add row to data frame with dplyr
                            
                                view source code in R [duplicate]
                            
                                Extract string before "|" [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I remove all duplicates so that NONE are left in a data frame?

Tags:

r

r-faq

duplicates

unique