I have a dataframe with the following structure
test <- data.frame(col = c('a; ff; cc; rr;', 'rr; a; cc; e;'))
Now I want to create a dataframe from this which contains a named column for each of the unique values in the test dataframe. A unique value is a value ended by the ';' character and starting with a space, not including the space. Then for each of the rows in the column I wish to fill the dummy columns with either a 1 or a 0. As given below
data.frame(a = c(1,1), ff = c(1,0), cc = c(1,1), rr = c(1,0), e = c(0,1))
a ff cc rr e
1 1 1 1 1 0
2 1 0 1 1 1
I tried creating a df using for loops and the unique values in the column but it's getting to messy. I have a vector available containing the unique values of the column. The problem is how to create the ones and zeros. I tried some mutate_all()
function with grep()
but this did not work.
I'd use splitstackshape
and mtabulate
from qdapTools
packages to get this as a one liner,
i.e.
library(splitstackshape)
library(qdapTools)
mtabulate(as.data.frame(t(cSplit(test, 'col', sep = ';', 'wide'))))
# a cc ff rr e
#V1 1 1 1 1 0
#V2 1 1 0 1 1
It can also be full splitstackshape
as @A5C1D2H2I1M1N2O1R2T1 mentions in comments,
cSplit_e(test, "col", ";", mode = "binary", type = "character", fill = 0)
Here's a possible data.table
implementation. First we split the rows into columns, melt into a single column and the spread it wide while counting the events for each row
library(data.table)
test2 <- setDT(test)[, tstrsplit(col, "; |;")]
dcast(melt(test2, measure = names(test2)), rowid(variable) ~ value, length)
# variable a cc e ff rr
# 1: 1 1 1 0 1 1
# 2: 2 1 1 1 0 1
Here's a base R approach:
x <- strsplit(as.character(test$col), ";\\s?") # split the strings
lvl <- unique(unlist(x)) # get unique elements
x <- lapply(x, factor, levels = lvl) # convert to factor
t(sapply(x, table)) # count elements and transpose
# a ff cc rr e
#[1,] 1 1 1 1 0
#[2,] 1 0 1 1 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With