Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Join vectors into dataframe by matching values

I'm trying to compare multiple vectors to see where there are matching values between them. I'd like to combine the vectors into a table where every column either has the same value (for matches) or NA (for no match).

For example:

list1 <- c("a", "b", "c", "d")
list2 <- c("a", "c", "d")
list3 <- c("a", "b", "c", "e", "f")  

Should become:

a  a  a
b NA  b
c  c  c
d  d  NA
NA NA e
NA NA f

I've tried making the vectors dataframes and using merge, join from dplyr, cbind, cbind.fill, but all those either return a single column or don't match values across all rows.

What's the best way to get this result with R?

like image 988
Evan Avatar asked Aug 29 '17 19:08

Evan


People also ask

How do I combine values into a Dataframe in R?

To merge two data frames (datasets) horizontally, use the merge() function in the R language. To bind or combine rows in R, use the rbind() function. The rbind() stands for row binding.

What is the use of Rbind () and Cbind () in R?

cbind() and rbind() both create matrices by combining several vectors of the same length. cbind() combines vectors as columns, while rbind() combines them as rows.

How do you combine vectors?

We can combine vectors by adding them, the sum of two vectors is called the resultant. In order to add two vectors, we add the corresponding components.


2 Answers

A Base R solution:

df1 = data.frame(col = list1, list1)
df2 = data.frame(col = list2, list2)
df3 = data.frame(col = list3, list3)

Reduce(function(x, y) merge(x, y, all=TRUE), list(df1, df2, df3))

#   col list1 list2 list3
# 1   a     a     a     a
# 2   b     b  <NA>     b
# 3   c     c     c     c
# 4   d     d     d  <NA>
# 5   e  <NA>  <NA>     e
# 6   f  <NA>  <NA>     f

Result:

> Reduce(function(x, y) merge(x, y, all=TRUE), list(df1, df2, df3))[,-1]
  list1 list2 list3
1     a     a     a
2     b  <NA>     b
3     c     c     c
4     d     d  <NA>
5  <NA>  <NA>     e
6  <NA>  <NA>     f

or with dplyr + purrr:

library(dplyr)
library(purrr)

list(list1, list2, list3) %>%
  map(~ data.frame(col = ., ., stringsAsFactors = FALSE)) %>%
  reduce(full_join, by = "col") %>%
  select(-col) %>%
  setNames(paste0("list", 1:3))

Data:

list1 <- c("a", "b", "c", "d")
list2 <- c("a", "c", "d")
list3 <- c("a", "b", "c", "e", "f") 
like image 190
acylam Avatar answered Nov 02 '22 07:11

acylam


You can use unlist and unique to get all possible values, then find their matches across each of the vectors. If nothing matches, match returns NA like you want:

list1 <- c("a", "b", "c", "d")
list2 <- c("a", "c", "d")
list3 <- c("a", "b", "c", "e", "f")
list_of_lists <- list(
  list1 = list1,
  list2 = list2,
  list3 = list3
)

all_values <- unique(unlist(list_of_lists))

fleshed_out <- vapply(
  list_of_lists,
  FUN.VALUE = all_values,
  FUN       = function(x) {
    x[match(all_values, x)]
  }
)

fleshed_out
#    list1 list2 list3
# [1,] "a"   "a"   "a"
# [2,] "b"   NA    "b"
# [3,] "c"   "c"   "c"
# [4,] "d"   "d"   NA
# [5,] NA    NA    "e"
# [6,] NA    NA    "f"
like image 23
Nathan Werth Avatar answered Nov 02 '22 06:11

Nathan Werth