I have a data frame of integers that is a subset of all of the n choose 3 combinations of 1...n. E.g., for n=5, it is something like:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 4
[3,] 1 2 5
[4,] 1 3 4
[5,] 1 3 5
[6,] 1 4 5
[7,] 2 1 3
[8,] 2 1 4
[9,] 2 1 5
[10,] 2 3 4
[11,] 2 3 5
[12,] 2 4 5
[13,] 3 1 2
[14,] 3 1 4
[15,] 3 1 5
[16,] 3 2 4
[17,] 3 2 5
[18,] 3 4 5
[19,] 4 1 2
[20,] 4 1 3
[21,] 4 1 5
[22,] 4 2 3
[23,] 4 2 5
[24,] 4 3 5
[25,] 5 1 2
[26,] 5 1 3
[27,] 5 1 4
[28,] 5 2 3
[29,] 5 2 4
[30,] 5 3 4
What I'd like to do is remove any rows with duplicate combinations, irrespective of ordering. E.g., [1,] 1 2 3
is the same as [1,] 2 1 3
is the same as [1,] 3 1 2
.
unique
, duplicated
, &c. don't seem to take this into account. Also, I am working with quite a large amount of data (n is ~750), so it ought to be a pretty fast operation. Are there any base functions or packages that can do this?
Remove duplicate combinations – TutorialSelect your data. Click on Data > Remove Duplicates button. Make sure all columns are checked. Click ok and done!
In Excel, there are several ways to filter for unique values—or remove duplicate values: To filter for unique values, click Data > Sort & Filter > Advanced. To remove duplicate values, click Data > Data Tools > Remove Duplicates.
Sort within the rows first, then use duplicated, see below:
# example data
dat = matrix(scan('data.txt'), ncol = 3, byrow = TRUE)
# Read 90 items
dat[ !duplicated(apply(dat, 1, sort), MARGIN = 2), ]
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 1 2 4
# [3,] 1 2 5
# [4,] 1 3 4
# [5,] 1 3 5
# [6,] 1 4 5
# [7,] 2 3 4
# [8,] 2 3 5
# [9,] 2 4 5
# [10,] 3 4 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With