I'm trying to solve the problem of having a co-occurence matrix. I have a datafile of transactions and items, and I want to see a matrix of the number of transactions where items appear together.
I'm a newbie in R programming and I'm having some fun finding out all the shortcuts that R has, rather than creating specific loops (I used to use C years ago and only sticking to Excel macros and SPSS now). I have checked the solutions here, but haven't found one that works (the closest is the solution given here: Co-occurrence matrix using SAC? - but it produced an error message when I used projecting_tm, I suspected that the cbind wasn't successful in my case.
Essentially I have a table containing the following:
TrxID Items Quant Trx1 A 3 Trx1 B 1 Trx1 C 1 Trx2 E 3 Trx2 B 1 Trx3 B 1 Trx3 C 4 Trx4 D 1 Trx4 E 1 Trx4 A 1 Trx5 F 5 Trx5 B 3 Trx5 C 2 Trx5 D 1, etc.
I want to create something like:
A B C D E F A 0 1 1 0 1 1 B 1 0 3 1 1 0 C 1 3 0 1 0 0 D 1 1 1 0 1 1 E 1 1 0 1 0 0 F 0 1 1 1 0 0
What I did was (and you'd probably laugh at my rookie R approach):
library(igraph) library(tnet) trx <- read.table("FileName.txt", header=TRUE) transID <- t(trx[1]) items <- t(trx[2]) id_item <- cbind(items,transID) item_item <- projecting_tm(id_item, method="sum") item_item <- tnet_igraph(item_item,type="weighted one-mode tnet") item_matrix <-get.adjacency(item_item,attr="weight") item_matrix
As mentioned above the cbind was probably unsuccessful, so the projecting_tm couldn't give me any result.
Any alternative approach or a correction to my method?
Your help would be much appreciated!
As a consequence, in order to use a co-occurrence matrix, you have to define your entites and the context in which they co-occur. In NLP, the most classic approach is to define each entity (ie, lines and columns) as a word present in a text, and the context as a sentence. Consider the following text : Roses are red.
The normalized co-occurrence matrix is obtained by dividing each element of G by the total number of co-occurrence pairs in G. The adjacency can be defined to take place in each of the four directions (horizontal, vertical, left and right diagonal) as shown in figure1.
A GLCM matrix is a method to calculate the spatial relationship of an image pixel.
Co-occurrence analysis is simply the counting of paired data within a collection unit.
Using "dat" from either of the answers above, try crossprod
and table
:
V <- crossprod(table(dat[1:2])) diag(V) <- 0 V # Items # Items A B C D E F # A 0 1 1 1 1 0 # B 1 0 3 1 1 1 # C 1 3 0 1 0 1 # D 1 1 1 0 1 1 # E 1 1 0 1 0 0 # F 0 1 1 1 0 0
I'd use a combination of the reshape2 package and matrix algebra:
#read in your data dat <- read.table(text="TrxID Items Quant Trx1 A 3 Trx1 B 1 Trx1 C 1 Trx2 E 3 Trx2 B 1 Trx3 B 1 Trx3 C 4 Trx4 D 1 Trx4 E 1 Trx4 A 1 Trx5 F 5 Trx5 B 3 Trx5 C 2 Trx5 D 1", header=T) #making the boolean matrix library(reshape2) dat2 <- melt(dat) w <- dcast(dat2, Items~TrxID) x <- as.matrix(w[,-1]) x[is.na(x)] <- 0 x <- apply(x, 2, function(x) as.numeric(x > 0)) #recode as 0/1 v <- x %*% t(x) #the magic matrix diag(v) <- 0 #repalce diagonal dimnames(v) <- list(w[, 1], w[,1]) #name the dimensions v
For the graphing maybe...
g <- graph.adjacency(v, weighted=TRUE, mode ='undirected') g <- simplify(g) # set labels and degrees of vertices V(g)$label <- V(g)$name V(g)$degree <- degree(g) plot(g)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With