Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to convert data.frame to transactions for arules

I read data from a csv file, the data has 3 columns, one is transaction id, the other two are product and product catagory. I need to convert this into transactions in order to use the apriori function in arules. It shows an error when I convert to transactions:

dat <- read.csv("spss.csv",head=TRUE,sep="," , as.is = T) dat[,2] <- factor(dat[,2]) dat[,3] <- factor(dat[,3]) spssdat <- dat[,c(1,2,3)] str(spssdat)  'data.frame':   108919 obs. of  3 variables:  $ Transaction_id: int  3000312 3000312 3001972 3003361 3003361 3003361 3003361 3003361 3003361 3004637 ...  $ product_catalog : Factor w/ 9 levels "AIM","BA","IM",..: 1 1 5 7 7 7 7 7 7 1 ...  $ product      : Factor w/ 332 levels "ACM","ACTG/AIM",..: 7 7 159 61 61 61 61 61 61 7 ...  trans4 <- as(spssdat, "transactions")  Error in as(spssdat, "transactions") :    no method or default for coercing “data.frame” to “transactions” 

If the data only have two columns, it can work by:

trans4 <- as(split(spssdat[,2], spssdat[,1]), "transactions") 

But I don't know how to convert when I have 3 columns. Usually there are the additional columns likes category attributes, customer attributes. so the column usually large than 2 columns. need to find rules between multiple columns.

like image 902
dennis ding Avatar asked Jun 26 '13 06:06

dennis ding


People also ask

How many transactions does the transactions dataset groceries contain?

Description. The Groceries data set contains 1 month (30 days) of real-world point-of-sale transaction data from a typical local grocery outlet. The data set contains 9835 transactions and the items are aggregated to 169 categories.


2 Answers

I have found some information that worked for me on this website. Let me copy relevant paragraph:

The dataframe can be in either a normalized (single) form or a flat file (basket) form.
When the file is in basket form it means that each record represents a transaction where the items in the basket are represented by columns.
When the dataset is in single form it means that each record represents one single item and each item contains a transaction id.

To load transactions from file, use read.transactions. In both your and my case file is in the single form.
I've used following code to load .csv file as transactions:

trans = read.transactions("some_data.csv", format = "single", sep = ",", cols = c("transactionID", "productID")) 

To fully understand above command, take a look at read.transactions manual, available after typing ?read.transactions in R console.

like image 127
Michał Rybak Avatar answered Sep 19 '22 15:09

Michał Rybak


I was attempting to do the same thing and after I factored all my columns in the data.frame I was working with, I still could not coerce it into an itemMatrix of transactions. Then I realized I never re-loaded the "arules" package for the session I was working in. Very stupid mistake, but just wanted to mention it in case anyone else runs into the same problem, try the simple stuff first:

library("arules") 
like image 38
Charlie Avatar answered Sep 18 '22 15:09

Charlie