Creating co-occurrence matrix

Tags:

matrix

I'm trying to solve the problem of having a co-occurence matrix. I have a datafile of transactions and items, and I want to see a matrix of the number of transactions where items appear together.

I'm a newbie in R programming and I'm having some fun finding out all the shortcuts that R has, rather than creating specific loops (I used to use C years ago and only sticking to Excel macros and SPSS now). I have checked the solutions here, but haven't found one that works (the closest is the solution given here: Co-occurrence matrix using SAC? - but it produced an error message when I used projecting_tm, I suspected that the cbind wasn't successful in my case.

Essentially I have a table containing the following:

TrxID Items Quant Trx1 A 3 Trx1 B 1 Trx1 C 1 Trx2 E 3 Trx2 B 1 Trx3 B 1 Trx3 C 4 Trx4 D 1 Trx4 E 1 Trx4 A 1 Trx5 F 5 Trx5 B 3 Trx5 C 2 Trx5 D 1, etc.

I want to create something like:

   A B C D E F A  0 1 1 0 1 1 B  1 0 3 1 1 0 C  1 3 0 1 0 0 D  1 1 1 0 1 1 E  1 1 0 1 0 0 F  0 1 1 1 0 0

What I did was (and you'd probably laugh at my rookie R approach):

library(igraph) library(tnet)  trx <- read.table("FileName.txt", header=TRUE)  transID <- t(trx[1]) items <- t(trx[2])  id_item <- cbind(items,transID) item_item <- projecting_tm(id_item, method="sum") item_item <- tnet_igraph(item_item,type="weighted one-mode tnet") item_matrix <-get.adjacency(item_item,attr="weight") item_matrix

As mentioned above the cbind was probably unsuccessful, so the projecting_tm couldn't give me any result.

Any alternative approach or a correction to my method?

Your help would be much appreciated!

812

asked Nov 08 '12 01:11

jacatra

2 Answers

Using "dat" from either of the answers above, try crossprod and table:

V <- crossprod(table(dat[1:2])) diag(V) <- 0 V #      Items # Items A B C D E F #     A 0 1 1 1 1 0 #     B 1 0 3 1 1 1 #     C 1 3 0 1 0 1 #     D 1 1 1 0 1 1 #     E 1 1 0 1 0 0 #     F 0 1 1 1 0 0

200

answered Oct 08 '22 11:10

A5C1D2H2I1M1N2O1R2T1

I'd use a combination of the reshape2 package and matrix algebra:

#read in your data dat <- read.table(text="TrxID Items Quant Trx1 A 3 Trx1 B 1 Trx1 C 1 Trx2 E 3 Trx2 B 1 Trx3 B 1 Trx3 C 4 Trx4 D 1 Trx4 E 1 Trx4 A 1 Trx5 F 5 Trx5 B 3 Trx5 C 2 Trx5 D 1", header=T)  #making the boolean matrix    library(reshape2) dat2 <- melt(dat) w <- dcast(dat2, Items~TrxID) x <- as.matrix(w[,-1]) x[is.na(x)] <- 0 x <- apply(x, 2,  function(x) as.numeric(x > 0))  #recode as 0/1 v <- x %*% t(x)                                   #the magic matrix  diag(v) <- 0                                      #repalce diagonal dimnames(v) <- list(w[, 1], w[,1])                #name the dimensions v

For the graphing maybe...

g <- graph.adjacency(v, weighted=TRUE, mode ='undirected') g <- simplify(g) # set labels and degrees of vertices V(g)$label <- V(g)$name V(g)$degree <- degree(g) plot(g)

answered Oct 08 '22 12:10

Tyler Rinker

Related questions
                            
                                Is Rgraphviz no longer available for R? [duplicate]
                            
                                Exclude columns by names in mutate_at in dplyr
                            
                                Connecting across missing values with geom_line
                            
                                Showing different axis labels using ggplot2 with facet_wrap
                            
                                How expensive is it to compute the eigenvalues of a matrix?
                            
                                How do I put more space between the axis labels and axis title in an R boxplot
                            
                                R equivalent of SELECT DISTINCT on two or more fields/variables
                            
                                geom_bar bars not displaying when specifying ylim
                            
                                Vectorizing a matrix [duplicate]
                            
                                How to subset from a list in R
                            
                                Formatting mouse over labels in plotly when using ggplotly
                            
                                Count the number of non-zero elements of each column
                            
                                dplyr - groupby on multiple columns using variable names
                            
                                Error in printing data.frame in excel using XLSX package in R
                            
                                long/bigint/decimal equivalent datatype in R
                            
                                Reshaping wide to long with multiple values columns [duplicate]
                            
                                Combine (rbind) data frames and create column with name of original data frames
                            
                                Is it possible to get the number of rows in a CSV file without opening it?
                            
                                Generating Random Strings
                            
                                Using geom_line with multiple groupings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With