I would like to create a binary matrix based on a list of strings.
dt = data.table(id = c('id1','id2','id3','id4','id5','id6'), sample = c("MER-1,MER-3,MER-4","MER-5","MER-2","MER-2,MER-3,MER-4,MER-5","MER_3","MER-5" ))
dt
    id                  sample
1: id1       MER-1,MER-3,MER-4
2: id2                   MER-5
3: id3                   MER-2
4: id4 MER-2,MER-3,MER-4,MER-5
5: id5                   MER_3
6: id6                   MER-5
Should result in something like:
m_count = matrix(c(1,0,1,1,0, 0,0,0,0,1, 0,1,0,0,0, 0,1,1,1,1, 0,0,1,0,0, 0,0,0,0,1), nrow = 6, ncol = 5)
m_count
    MER-1 MER-2 MER-3 MER-4 MER-5
id1     1     0     0     1     0
id2     0     0     0     1     0
id3     1     0     0     0     0
id4     1     1     0     0     0
id5     0     0     1     1     0
id6     0     1     1     0     1
I could loop over each element of the list, and fill the matrix, but given the size of my table that would be really slow. Is there any quicker/more elegant way to go? Maybe with dplyr/tidyverse ? Thanks!
Using dt from the Note at the end which fixed the typo in the question, use separate_rows to expand the data row by row and then use table to compute the counts.
library(data.table)
library(dplyr)
library(tidyr)
dt %>%
  separate_rows(sample, sep = ",") %>%
  table
giving:
     sample
id    MER-1 MER-2 MER-3 MER-4 MER-5
  id1     1     0     1     1     0
  id2     0     0     0     0     1
  id3     0     1     0     0     0
  id4     0     1     1     1     1
  id5     0     0     1     0     0
  id6     0     0     0     0     1
library(data.table)
dt <- data.table(id = c('id1','id2','id3','id4','id5','id6'), 
  sample = c("MER-1,MER-3,MER-4","MER-5","MER-2","MER-2,MER-3,MER-4,MER-5","MER-3","MER-5" ))
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With