Suppose I have a data frame (df) that looks like below: <pre class="prettyprint"><code>options(stringsAsFactors = F) cars <- c("Car1", "Car2", "Car3", "Car4", "Car5", "Car6", "Car7", "Car8", "Car9") test1 <- c(0,0,3,1,4,2,1,3,0) test2 <- c(0,0,2,1,0,2,2,5,0) test3 <- c(1,0,5,1,2,2,6,7,0) test4 <- c(2,NA,2,1,2,2,1,1,0) test5 <- c(0,0,1,1,0,2,1,3,0) test6 <- c(1,0,1,1,1,2,3,4,0) test7 <- c(3,0,2,1,0,2,1,1,0) df <- data.frame(cars,test1,test2,test3,test4,test5,test6,test7) #df cars test1 test2 test3 test4 test5 test6 test7 #1 Car1 0 0 1 2 0 1 3 #2 Car2 0 0 0 NA 0 0 0 #3 Car3 3 2 5 2 1 1 2 #4 Car4 1 1 1 1 1 1 1 #5 Car5 4 0 2 2 0 1 0 #6 Car6 2 2 2 2 2 2 2 #7 Car7 1 2 6 1 1 3 1 #8 Car8 3 5 7 1 3 4 1 #9 Car9 0 0 0 0 0 0 0 </code></pre> I want to remove any rows that have the same value throughout the entire row (in the example above, I would like to keep rows 1, 3, 5, 7, 8 and remove the rest). I've figured out how to remove all rows that have zeros <pre class="prettyprint"><code> df$sum <- rowSums(df[,c(2:8)], na.rm = T ) df.all0 <- df[which(df$sum == 0),] </code></pre> However, this doesn't necessarily work for all the other rows. Unlike other questions, this question asks to look for duplicates across the entire row, not just specific columns. Any help would be greatly appreciated!

We can also use <code>Map</code> with <code>Reduce</code> <pre class="prettyprint"><code>df[c(Reduce(`+`, Map(function(x,y) x != y & !is.na(x), df[-1], list(df[2]))) != 0),] # cars test1 test2 test3 test4 test5 test6 test7 #1 Car1 0 0 1 2 0 1 3 #3 Car3 3 2 5 2 1 1 2 #5 Car5 4 0 2 2 0 1 0 #7 Car7 1 2 6 1 1 3 1 #8 Car8 3 5 7 1 3 4 1 </code></pre> <hr> Or using <code>tidyverse</code> <pre class="prettyprint"><code>library(tidyverse) df %>% filter_at(vars(starts_with("test")), any_vars((. != test1))) # cars test1 test2 test3 test4 test5 test6 test7 #1 Car1 0 0 1 2 0 1 3 #2 Car3 3 2 5 2 1 1 2 #3 Car5 4 0 2 2 0 1 0 #4 Car7 1 2 6 1 1 3 1 #5 Car8 3 5 7 1 3 4 1 </code></pre>

Remove rows with the same value across all columns

Tags:

dataframe

r

Suppose I have a data frame (df) that looks like below:

options(stringsAsFactors = F)

cars <- c("Car1", "Car2", "Car3", "Car4", "Car5", "Car6", "Car7", "Car8", "Car9")
test1 <- c(0,0,3,1,4,2,1,3,0)
test2 <- c(0,0,2,1,0,2,2,5,0)
test3 <- c(1,0,5,1,2,2,6,7,0)
test4 <- c(2,NA,2,1,2,2,1,1,0)
test5 <- c(0,0,1,1,0,2,1,3,0)
test6 <- c(1,0,1,1,1,2,3,4,0)
test7 <- c(3,0,2,1,0,2,1,1,0)

df <- data.frame(cars,test1,test2,test3,test4,test5,test6,test7)

#df
   cars test1 test2 test3 test4 test5 test6 test7
#1 Car1     0     0     1     2     0     1     3
#2 Car2     0     0     0    NA     0     0     0
#3 Car3     3     2     5     2     1     1     2
#4 Car4     1     1     1     1     1     1     1
#5 Car5     4     0     2     2     0     1     0
#6 Car6     2     2     2     2     2     2     2
#7 Car7     1     2     6     1     1     3     1
#8 Car8     3     5     7     1     3     4     1
#9 Car9     0     0     0     0     0     0     0

I want to remove any rows that have the same value throughout the entire row (in the example above, I would like to keep rows 1, 3, 5, 7, 8 and remove the rest).

I've figured out how to remove all rows that have zeros

 df$sum <- rowSums(df[,c(2:8)], na.rm = T )
 df.all0 <- df[which(df$sum == 0),]

However, this doesn't necessarily work for all the other rows. Unlike other questions, this question asks to look for duplicates across the entire row, not just specific columns.

Any help would be greatly appreciated!

488

asked Jun 06 '17 19:06

Sheila

2 Answers

keep <- apply(df[2:8], 1, function(x) length(unique(x[!is.na(x)])) != 1)
df[keep, ]

  cars test1 test2 test3 test4 test5 test6 test7
1 Car1     0     0     1     2     0     1     3
3 Car3     3     2     5     2     1     1     2
5 Car5     4     0     2     2     0     1     0
7 Car7     1     2     6     1     1     3     1
8 Car8     3     5     7     1     3     4     1

140

answered Oct 16 '22 12:10

JasonWang

We can also use Map with Reduce

df[c(Reduce(`+`, Map(function(x,y) x != y & !is.na(x), df[-1], list(df[2]))) != 0),]
#  cars test1 test2 test3 test4 test5 test6 test7
#1 Car1     0     0     1     2     0     1     3
#3 Car3     3     2     5     2     1     1     2
#5 Car5     4     0     2     2     0     1     0
#7 Car7     1     2     6     1     1     3     1
#8 Car8     3     5     7     1     3     4     1

Or using tidyverse

library(tidyverse)
df %>% 
    filter_at(vars(starts_with("test")), any_vars((. != test1)))
#   cars test1 test2 test3 test4 test5 test6 test7
#1 Car1     0     0     1     2     0     1     3
#2 Car3     3     2     5     2     1     1     2
#3 Car5     4     0     2     2     0     1     0
#4 Car7     1     2     6     1     1     3     1
#5 Car8     3     5     7     1     3     4     1

answered Oct 16 '22 12:10

akrun

Related questions
                            
                                How to add horizontal separator in R's heatmap.2
                            
                                microbenchmark as data frame or matrix
                            
                                Using ROracle dbWriteTable to write POSIXct back to Oracle DB
                            
                                How do I add a link to open a pdf file in a new window from my R shiny app?
                            
                                How to decode encoded polylines from OSRM and plotting route geometry?
                            
                                R: Combining Nested List Elements by Name
                            
                                Change ggplot legend title
                            
                                How can I perform a "setdiff" merge using data.table?
                            
                                Missing horizontal scroll bar in R Markdown HTML code chunks and output
                            
                                R Error: could not find function "select"
                            
                                Replace NA with 0, only in numeric columns in data.table
                            
                                Passing a column name to R tidyr spread
                            
                                Counting occurrences without modifying the original order
                            
                                stringr equivalent to grep
                            
                                Change size of hover text in Plotly
                            
                                filter duplicates from a data frame in r [duplicate]
                            
                                Removing latitude and longitude labels in ggplot
                            
                                as.Date produces unexpected result in a sequence of week-based dates
                            
                                Spread with duplicate identifiers (using tidyverse and %>%) [duplicate]
                            
                                `purrr::map` to any type

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With