Compute matrix of sums

Tags:

Suppose I have a data.frame with several columns of categorical data, and one column of quantitative data. Here's an example:

my_data <- structure(list(A = c("f", "f", "f", "f", "t", "t", "t", "t"), 
                          B = c("t", "t", "t", "t", "f", "f", "f", "f"), 
                          C = c("f","f", "t", "t", "f", "f", "t", "t"), 
                          D = c("f", "t", "f", "t", "f", "t", "f", "t")),
                     .Names = c("A", "B", "C", "D"), 
                     row.names = 1:8, class = "data.frame")
my_data$quantity <- 1:8

Now my_data looks like this:

  A B C D quantity
1 f t f f        1
2 f t f t        2
3 f t t f        3
4 f t t t        4
5 t f f f        5
6 t f f t        6
7 t f t f        7
8 t f t t        8

What's the most elegant way to get a cross tab / sum of quantity where both values =='t'? That is, I'm looking for an output like this:

   A   B   C   D  
A "?" "?" "?" "?"
B "?" "?" "?" "?"
C "?" "?" "?" "?"
D "?" "?" "?" "?"

..where the intersection of x/y is the sum of quantity where x=='t' and y=='t'. (I only care about half this table, really, since half is duplicated)

So for example the value of A/C should be:

good_rows <- with(my_data, A=='t' & C=='t')
sum(my_data$quantity[good_rows])

15

*Edit: What I already had was:

nodes <- names(my_data)[-ncol(my_data)]
sapply(nodes, function(rw) {
  sapply(nodes, function(cl) {
    good_rows <- which(my_data[, rw]=='t' & my_data[, cl]=='t')
    sum(my_data[good_rows, 'quantity'])
  })
})

Which gives the desired result:

   A  B  C  D
A 26  0 15 14
B  0 10  7  6
C 15  7 22 12
D 14  6 12 20

I like this solution because, being very 'literal', it's fairly readable: two apply funcs (aka loops) to go through rows * columns, compute each cell, and produce the matrix. Also plenty fast enough on my actual data (tiny: 192 rows x 10 columns). I didn't like it because it seems like a lot of lines. Thank you for the answers so far! I will review and absorb.

808

asked Sep 30 '14 23:09

arvi1000

2 Answers

Try using matrix multiplication

temp <- (my_data[1:4]=="t")*my_data$quantity

t(temp) %*% (my_data[1:4]=="t") 

#   A  B  C  D
#A 26  0 15 14
#B  0 10  7  6
#C 15  7 22 12
#D 14  6 12 20

(Although this might be a fluke)

182

answered Sep 22 '22 03:09

user20650

For each row name, you could build a vector dat that's just the rows with that value equal to t. Then you could multiply the true/false values in this data subset by that row's quantity value (so it's 0 when false and the quantity value when true), finally taking the column sum.

sapply(c("A", "B", "C", "D"), function(x) {
  dat <- my_data[my_data[,x] == "t",]
  colSums((dat[,-5] == "t") * dat[,5])
})
#    A  B  C  D
# A 26  0 15 14
# B  0 10  7  6
# C 15  7 22 12
# D 14  6 12 20

answered Sep 25 '22 03:09

josliber

Related questions
                            
                                How do I tell R to fill the circle dots with colour on a scatter plot?
                            
                                Set x-axis labels to dates when plotting time series
                            
                                R: find column with the largest column sum
                            
                                Normalizing the values in a data table using the values stored in another data table
                            
                                Handling htmlParse error (failed to load HTTP resource)
                            
                                Problems while reproducing Sankey chart example with d3_sankey
                            
                                Scatter plot and boxplot overlay
                            
                                Correct usage of scale_fill_manual() to create multi-colored histogram bars in ggplot2?
                            
                                Subtract multiple columns ignoring NA
                            
                                RStudio is blank when opened
                            
                                Automatically document all methods of an S4 generic, using roxygen2
                            
                                configure: WARNING: you cannot build info or HTML versions of the R manuals
                            
                                ggplot2 is there an easy way to wrap annotation text?
                            
                                Call custom function with if statement in the summarize function in dplyr
                            
                                R using diff: non-numeric argument to binary operator error
                            
                                Make table show percentages instead of frequencies in R
                            
                                Extracting nth element from a nested list following strsplit - R
                            
                                Code box size and font size in RPres
                            
                                Calculation of mutual information in R
                            
                                Building R package from github: how to disable building vignettes?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Compute matrix of sums

Tags:

r

data.table

reshape2

arvi1000

People also ask

2 Answers

user20650

josliber

Recent Activity

Donate For Us