Suppose I have a data.frame with several columns of categorical data, and one column of quantitative data. Here's an example:
my_data <- structure(list(A = c("f", "f", "f", "f", "t", "t", "t", "t"),
B = c("t", "t", "t", "t", "f", "f", "f", "f"),
C = c("f","f", "t", "t", "f", "f", "t", "t"),
D = c("f", "t", "f", "t", "f", "t", "f", "t")),
.Names = c("A", "B", "C", "D"),
row.names = 1:8, class = "data.frame")
my_data$quantity <- 1:8
Now my_data
looks like this:
A B C D quantity
1 f t f f 1
2 f t f t 2
3 f t t f 3
4 f t t t 4
5 t f f f 5
6 t f f t 6
7 t f t f 7
8 t f t t 8
What's the most elegant way to get a cross tab / sum of quantity
where both values =='t'
? That is, I'm looking for an output like this:
A B C D
A "?" "?" "?" "?"
B "?" "?" "?" "?"
C "?" "?" "?" "?"
D "?" "?" "?" "?"
..where the intersection of x/y is the sum of quantity
where x=='t'
and y=='t'
. (I only care about half this table, really, since half is duplicated)
So for example the value of A/C should be:
good_rows <- with(my_data, A=='t' & C=='t')
sum(my_data$quantity[good_rows])
15
*Edit: What I already had was:
nodes <- names(my_data)[-ncol(my_data)]
sapply(nodes, function(rw) {
sapply(nodes, function(cl) {
good_rows <- which(my_data[, rw]=='t' & my_data[, cl]=='t')
sum(my_data[good_rows, 'quantity'])
})
})
Which gives the desired result:
A B C D
A 26 0 15 14
B 0 10 7 6
C 15 7 22 12
D 14 6 12 20
I like this solution because, being very 'literal', it's fairly readable: two apply funcs (aka loops) to go through rows * columns, compute each cell, and produce the matrix. Also plenty fast enough on my actual data (tiny: 192 rows x 10 columns). I didn't like it because it seems like a lot of lines. Thank you for the answers so far! I will review and absorb.
A matrix can only be added to (or subtracted from) another matrix if the two matrices have the same dimensions . To add two matrices, just add the corresponding entries, and place this sum in the corresponding position in the matrix which results.
Definition. A matrix equation is an equation of the form Ax = b , where A is an m × n matrix, b is a vector in R m , and x is a vector whose coefficients x 1 , x 2 ,..., x n are unknown.
Try using matrix multiplication
temp <- (my_data[1:4]=="t")*my_data$quantity
t(temp) %*% (my_data[1:4]=="t")
# A B C D
#A 26 0 15 14
#B 0 10 7 6
#C 15 7 22 12
#D 14 6 12 20
(Although this might be a fluke)
For each row name, you could build a vector dat
that's just the rows with that value equal to t
. Then you could multiply the true/false values in this data subset by that row's quantity value (so it's 0 when false and the quantity value when true), finally taking the column sum.
sapply(c("A", "B", "C", "D"), function(x) {
dat <- my_data[my_data[,x] == "t",]
colSums((dat[,-5] == "t") * dat[,5])
})
# A B C D
# A 26 0 15 14
# B 0 10 7 6
# C 15 7 22 12
# D 14 6 12 20
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With