I am attempting to format a column of data into many binary columns to eventually use for association rule mining. I have had some success using a for loop and a simple triplet matrix, but I am unsure how to aggregate by the levels in the first column thereafter--similar to a group by statement in SQL. I have provided an example below, albeit with a much smaller data set--if successful my actual data set will be 4,200 rows by 3,902 columns so any solution needs to be scaleable. Any suggestions or alternative approaches would be greatly appreciated!
> data <- data.frame(a=c('sally','george','andy','sue','sue','sally','george'), b=c('green','yellow','green','yellow','purple','brown','purple'))
> data
a b
1 sally green
2 george yellow
3 andy green
4 sue yellow
5 sue purple
6 sally brown
7 george purple
x <- data[,1]
for(i in as.numeric(2:ncol(data)))
x <- cbind(x, simple_triplet_matrix(i=1:nrow(data), j=as.numeric(data[,i]),
v = rep(1,nrow(data)), dimnames = list(NULL, levels(data[,i]))) )
##Looks like this:
> as.matrix(x)
name brown green purple yellow
[1,] "sally" "0" "1" "0" "0"
[2,] "george" "0" "0" "0" "1"
[3,] "andy" "0" "1" "0" "0"
[4,] "sue" "0" "0" "0" "1"
[5,] "sue" "0" "0" "1" "0"
[6,] "sally" "1" "0" "0" "0" ##Need to aggregate by Name
##Would like it to look like this:
name brown green purple yellow
[1,] "sally" "1" "1" "0" "0"
[2,] "george" "0" "0" "0" "1"
[3,] "andy" "0" "1" "0" "0"
[4,] "sue" "0" "0" "1" "1"
Here's how you can transpose cell content: Copy the cell range. Select the empty cells where you want to paste the transposed data. On the Home tab, click the Paste icon, and select Paste Transpose.
select toolbox, drag matrix into design pane select report data, drag subject_name into rows, drag subject-status into columns, drag student_count into data. Preview.
This should do the trick:
## Get a contingency table of counts
X <- with(data, table(a,b))
## Massage it into the format you're wanting
cbind(name = rownames(X), apply(X, 2, as.character))
# name brown green purple yellow
# [1,] "andy" "0" "1" "0" "0"
# [2,] "george" "0" "0" "1" "1"
# [3,] "sally" "1" "1" "0" "0"
# [4,] "sue" "0" "0" "1" "1"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With