I have a correlation matrix that I put in a dataframe like so:
row | var1 | var2 | cor
1 | A | B | 0.6
2 | B | A | 0.6
3 | A | C | 0.4
4 | C | A | 0.4
These results are duplicated into 2 rows each, with both combinations of "var1" and "var2". I only need one, preferably with the lower variable first (e.g. rows 1 and 3).
I've been playing with dplyr for two hours and reading old threads, but not finding what I need.
# get correlation of every concept versus every concept
data.cor <- data.jobs %>%
select(-y,-X) %>%
as.matrix %>%
cor %>%
as.data.frame %>%
rownames_to_column(var = 'var1') %>%
gather(var2, value, -var1)
I would like output to look like so:
row | var1 | var2 | cor
1 | A | B | 0.6
3 | A | C | 0.4
I am trying to do this without resorting to a loop.
Here's one way with tidyverse
-
dat2 <- dat %>%
filter(!duplicated(paste0(pmax(var1, var2), pmin(var1, var2))))
# A tibble: 2 x 3
var1 var2 cor
<chr> <chr> <dbl>
1 A B 0.600
2 A C 0.400
Data -
dat <- data_frame(
var1 = LETTERS[c(1,2,1,3)],
var2 = LETTERS[c(2,1,3,1)],
cor = c(0.6,0.6,0.4,0.4))
Note: cleaned up the logic thanks to @tmfmnk
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With