in the following dataframe I want to keep rows only once if they have duplicate pairs (1 4 and 4 1 are considered the same pair) of <code>Var1</code> and <code>Var2</code>. I thought of sorting <code>Var1</code> and <code>Var2</code> within the row and then remove duplicate rows based on both <code>Var1</code> and <code>Var2</code>. However, I don't get to my desired result. This is what my data looks like: <pre class="prettyprint"><code>Var1 <- c(1,2,3,4,5,5) Var2 <- c(4,3,2,1,5,5) f <- c("blue","green","yellow","red","orange2","grey") g <- c("blue","green","yellow","red","orange1","grey") testdata <- data.frame(Var1,Var2,f,g) </code></pre> I can sort within the rows, however the values of columns f and g should remain untouched, how do I do this? <pre class="prettyprint"><code>testdata <- t(apply(testdata, 1, function(x) x[order(x)])) testdata <- as.data.table(testdata) </code></pre> Then, I want to remove duplicate rows based on <code>Var1</code> and <code>Var2</code> I want to get this as a result: <pre class="prettyprint"><code>Var1 Var2 f g 1 4 blue blue 2 3 green green 5 5 orange2 orange1 </code></pre> Thanks for your help!

In case people are interested in solving this using dplyr: <pre class="prettyprint"><code>library(dplyr) testdata %>% rowwise() %>% mutate(key = paste(sort(c(Var1, Var2)), collapse="")) %>% distinct(key, .keep_all=T) %>% select(-key) # Source: local data frame [3 x 4] # Groups: <by row> # # # A tibble: 3 × 4 # Var1 Var2 f g # <dbl> <dbl> <fctr> <fctr> # 1 1 4 blue blue # 2 2 3 green green # 3 5 5 orange2 orange1 </code></pre>

Instead of sorting for the whole dataset, sort the 'Var1', 'Var2', and then use <code>duplicated</code> to remove the duplicate rows <pre class="prettyprint"><code>testdata[1:2] <- t( apply(testdata[1:2], 1, sort) ) testdata[!duplicated(testdata[1:2]),] # Var1 Var2 f g #1 1 4 blue blue #2 2 3 green green #5 5 5 orange2 orange1 </code></pre>

Remove duplicate column pairs, sort rows based on 2 columns [duplicate]

Tags:

r

in the following dataframe I want to keep rows only once if they have duplicate pairs (1 4 and 4 1 are considered the same pair) of Var1 and Var2. I thought of sorting Var1 and Var2 within the row and then remove duplicate rows based on both Var1 and Var2. However, I don't get to my desired result.

This is what my data looks like:

Var1 <- c(1,2,3,4,5,5)
Var2 <- c(4,3,2,1,5,5)
f <- c("blue","green","yellow","red","orange2","grey")
g <- c("blue","green","yellow","red","orange1","grey")
testdata <- data.frame(Var1,Var2,f,g)

I can sort within the rows, however the values of columns f and g should remain untouched, how do I do this?

testdata <- t(apply(testdata, 1, function(x) x[order(x)]))
testdata <- as.data.table(testdata)

Then, I want to remove duplicate rows based on Var1 and Var2

I want to get this as a result:

Var1 Var2 f       g
1    4    blue    blue
2    3    green   green
5    5    orange2 orange1

Thanks for your help!

853

asked Mar 20 '15 15:03

qg7el

2 Answers

In case people are interested in solving this using dplyr:

library(dplyr)
testdata %>% 
   rowwise() %>%
   mutate(key = paste(sort(c(Var1, Var2)), collapse="")) %>%
   distinct(key, .keep_all=T) %>%
   select(-key)

# Source: local data frame [3 x 4]
# Groups: <by row>
# 
# # A tibble: 3 × 4
#    Var1  Var2       f       g
#   <dbl> <dbl>  <fctr>  <fctr>
# 1     1     4    blue    blue
# 2     2     3   green   green
# 3     5     5 orange2 orange1

194

answered Sep 23 '22 17:09

sinQueso

Instead of sorting for the whole dataset, sort the 'Var1', 'Var2', and then use duplicated to remove the duplicate rows

testdata[1:2] <- t( apply(testdata[1:2], 1, sort) )
testdata[!duplicated(testdata[1:2]),]
#   Var1 Var2       f       g
#1    1    4    blue    blue
#2    2    3   green   green
#5    5    5 orange2 orange1

answered Sep 23 '22 17:09

akrun

Related questions
                            
                                time display in clock with xy scatter plot in r
                            
                                Keyed lookup on data.table without 'with'
                            
                                Use object names within a list in lapply/ldply
                            
                                ggplot2: How to specify multiple fill colors for points that are connected by lines of different colors
                            
                                how to generate random numbers with sequence in R
                            
                                how to draw arrow in ggplot2 with annotation
                            
                                Change thickness of a marker in ggplot2
                            
                                How can I shorten x-axis label text in ggplot?
                            
                                What is the most useful output format for graphs? [closed]
                            
                                Loop through netcdf files and run calculations - Python or R
                            
                                reading multiple csv files in R [duplicate]
                            
                                R: Compare all the columns pairwise in matrix
                            
                                error with scale_x_labels in ggplot2
                            
                                How can I summarizing data statistics using R
                            
                                Captions on tables in pdf documents generated by rmarkdown
                            
                                How do I evaluate columns inside data.table with different conditions
                            
                                Change the order of elements in vector in R
                            
                                Visualizing time series in spirals using R or Python?
                            
                                Creating variable in R data frame depending on another data frame
                            
                                Convert List of Vectors into Data Frame of Counts [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With