I read data from a csv file, the data has 3 columns, one is transaction id, the other two are product and product catagory. I need to convert this into transactions in order to use the apriori
function in arules. It shows an error when I convert to transactions:
dat <- read.csv("spss.csv",head=TRUE,sep="," , as.is = T) dat[,2] <- factor(dat[,2]) dat[,3] <- factor(dat[,3]) spssdat <- dat[,c(1,2,3)] str(spssdat) 'data.frame': 108919 obs. of 3 variables: $ Transaction_id: int 3000312 3000312 3001972 3003361 3003361 3003361 3003361 3003361 3003361 3004637 ... $ product_catalog : Factor w/ 9 levels "AIM","BA","IM",..: 1 1 5 7 7 7 7 7 7 1 ... $ product : Factor w/ 332 levels "ACM","ACTG/AIM",..: 7 7 159 61 61 61 61 61 61 7 ... trans4 <- as(spssdat, "transactions") Error in as(spssdat, "transactions") : no method or default for coercing “data.frame” to “transactions”
If the data only have two columns, it can work by:
trans4 <- as(split(spssdat[,2], spssdat[,1]), "transactions")
But I don't know how to convert when I have 3 columns. Usually there are the additional columns likes category attributes, customer attributes. so the column usually large than 2 columns. need to find rules between multiple columns.
Description. The Groceries data set contains 1 month (30 days) of real-world point-of-sale transaction data from a typical local grocery outlet. The data set contains 9835 transactions and the items are aggregated to 169 categories.
I have found some information that worked for me on this website. Let me copy relevant paragraph:
The dataframe can be in either a normalized (single) form or a flat file (basket) form.
When the file is in basket form it means that each record represents a transaction where the items in the basket are represented by columns.
When the dataset is in single form it means that each record represents one single item and each item contains a transaction id.
To load transactions from file, use read.transactions
. In both your and my case file is in the single form.
I've used following code to load .csv file as transactions
:
trans = read.transactions("some_data.csv", format = "single", sep = ",", cols = c("transactionID", "productID"))
To fully understand above command, take a look at read.transactions
manual, available after typing ?read.transactions
in R console.
I was attempting to do the same thing and after I factored all my columns in the data.frame I was working with, I still could not coerce it into an itemMatrix of transactions. Then I realized I never re-loaded the "arules" package for the session I was working in. Very stupid mistake, but just wanted to mention it in case anyone else runs into the same problem, try the simple stuff first:
library("arules")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With