Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I select all unique combinations of two columns in an R data frame?

Tags:

r

dplyr

tidyr

I have a correlation matrix that I put in a dataframe like so:

row | var1 | var2 | cor
1   | A    | B    | 0.6
2   | B    | A    | 0.6
3   | A    | C    | 0.4
4   | C    | A    | 0.4

These results are duplicated into 2 rows each, with both combinations of "var1" and "var2". I only need one, preferably with the lower variable first (e.g. rows 1 and 3).

I've been playing with dplyr for two hours and reading old threads, but not finding what I need.

# get correlation of every concept versus every concept
data.cor <- data.jobs %>% 
  select(-y,-X) %>%
  as.matrix %>%
  cor %>%
  as.data.frame %>%
  rownames_to_column(var = 'var1') %>%
  gather(var2, value, -var1)

I would like output to look like so:

row | var1 | var2 | cor
1   | A    | B    | 0.6
3   | A    | C    | 0.4

I am trying to do this without resorting to a loop.

like image 401
Josh Pause Avatar asked Dec 10 '22 03:12

Josh Pause


1 Answers

Here's one way with tidyverse -

dat2 <- dat %>% 
  filter(!duplicated(paste0(pmax(var1, var2), pmin(var1, var2))))


# A tibble: 2 x 3
  var1  var2    cor
  <chr> <chr> <dbl>
1 A     B     0.600
2 A     C     0.400

Data -

dat <- data_frame(
  var1 = LETTERS[c(1,2,1,3)],
  var2 = LETTERS[c(2,1,3,1)],
  cor = c(0.6,0.6,0.4,0.4))

Note: cleaned up the logic thanks to @tmfmnk

like image 183
Shree Avatar answered Jan 26 '23 00:01

Shree