Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating co-occurrence matrix

Tags:

r

matrix

I'm trying to solve the problem of having a co-occurence matrix. I have a datafile of transactions and items, and I want to see a matrix of the number of transactions where items appear together.

I'm a newbie in R programming and I'm having some fun finding out all the shortcuts that R has, rather than creating specific loops (I used to use C years ago and only sticking to Excel macros and SPSS now). I have checked the solutions here, but haven't found one that works (the closest is the solution given here: Co-occurrence matrix using SAC? - but it produced an error message when I used projecting_tm, I suspected that the cbind wasn't successful in my case.

Essentially I have a table containing the following:

TrxID Items Quant Trx1 A 3 Trx1 B 1 Trx1 C 1 Trx2 E 3 Trx2 B 1 Trx3 B 1 Trx3 C 4 Trx4 D 1 Trx4 E 1 Trx4 A 1 Trx5 F 5 Trx5 B 3 Trx5 C 2 Trx5 D 1, etc. 

I want to create something like:

   A B C D E F A  0 1 1 0 1 1 B  1 0 3 1 1 0 C  1 3 0 1 0 0 D  1 1 1 0 1 1 E  1 1 0 1 0 0 F  0 1 1 1 0 0 

What I did was (and you'd probably laugh at my rookie R approach):

library(igraph) library(tnet)  trx <- read.table("FileName.txt", header=TRUE)  transID <- t(trx[1]) items <- t(trx[2])  id_item <- cbind(items,transID) item_item <- projecting_tm(id_item, method="sum") item_item <- tnet_igraph(item_item,type="weighted one-mode tnet") item_matrix <-get.adjacency(item_item,attr="weight") item_matrix 

As mentioned above the cbind was probably unsuccessful, so the projecting_tm couldn't give me any result.

Any alternative approach or a correction to my method?

Your help would be much appreciated!

like image 812
jacatra Avatar asked Nov 08 '12 01:11

jacatra


People also ask

How do you create a co-occurrence matrix in NLP?

As a consequence, in order to use a co-occurrence matrix, you have to define your entites and the context in which they co-occur. In NLP, the most classic approach is to define each entity (ie, lines and columns) as a word present in a text, and the context as a sentence. Consider the following text : Roses are red.

How do you find the co occurence matrix?

The normalized co-occurrence matrix is obtained by dividing each element of G by the total number of co-occurrence pairs in G. The adjacency can be defined to take place in each of the four directions (horizontal, vertical, left and right diagonal) as shown in figure1.

What is a co-occurrence matrix used for?

A GLCM matrix is a method to calculate the spatial relationship of an image pixel.

What is co-occurrence data?

Co-occurrence analysis is simply the counting of paired data within a collection unit.


2 Answers

Using "dat" from either of the answers above, try crossprod and table:

V <- crossprod(table(dat[1:2])) diag(V) <- 0 V #      Items # Items A B C D E F #     A 0 1 1 1 1 0 #     B 1 0 3 1 1 1 #     C 1 3 0 1 0 1 #     D 1 1 1 0 1 1 #     E 1 1 0 1 0 0 #     F 0 1 1 1 0 0 
like image 200
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 08 '22 11:10

A5C1D2H2I1M1N2O1R2T1


I'd use a combination of the reshape2 package and matrix algebra:

#read in your data dat <- read.table(text="TrxID Items Quant Trx1 A 3 Trx1 B 1 Trx1 C 1 Trx2 E 3 Trx2 B 1 Trx3 B 1 Trx3 C 4 Trx4 D 1 Trx4 E 1 Trx4 A 1 Trx5 F 5 Trx5 B 3 Trx5 C 2 Trx5 D 1", header=T)  #making the boolean matrix    library(reshape2) dat2 <- melt(dat) w <- dcast(dat2, Items~TrxID) x <- as.matrix(w[,-1]) x[is.na(x)] <- 0 x <- apply(x, 2,  function(x) as.numeric(x > 0))  #recode as 0/1 v <- x %*% t(x)                                   #the magic matrix  diag(v) <- 0                                      #repalce diagonal dimnames(v) <- list(w[, 1], w[,1])                #name the dimensions v 

For the graphing maybe...

g <- graph.adjacency(v, weighted=TRUE, mode ='undirected') g <- simplify(g) # set labels and degrees of vertices V(g)$label <- V(g)$name V(g)$degree <- degree(g) plot(g) 
like image 40
Tyler Rinker Avatar answered Oct 08 '22 12:10

Tyler Rinker