Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Big Data convert to "transactions" from arules package

The arules package in R uses the class 'transactions'. So in order to use the function apriori() I need to convert my existing data. I've got a Matrix with 2 columns and roughly 1.6mm rows and tried to convert the data like this:

transaction_data <- as(split(original_data[,"id"], original_data[,"type"]), "transactions")

where original_data is my data matrix. Because of the amount of data I used the largest AWS Amazon machine with 64gb RAM. After a while I get

resulting vector exceeds vector length limit in 'AnswerType'

The Memory Usage of the machine was still 'only' at 60%. Is this a R-based limitation? Is there any way to work around this other than using sampling? When only using 1/4 of the data the transformation worked fine.

Edit: As pointed out, one of the variables was a factor instead of character. After changing the transformation was processed quickly and correct.

like image 581
Marco Avatar asked Aug 30 '11 16:08

Marco


1 Answers

I suspect that your problem is arising because one of the functions uses integers (rather than, say, floats) to index values. In any case, the size isn't too big, so this is surprising. Maybe the data has some other issue, such as characters as factors?

In general, though, I'd really recommend using memory mapped files, via bigmemory, which you can also split and process via bigsplit or mwhich. If offloading the data works for you, then you can also use a much smaller instance size and save $$. :)

like image 118
Iterator Avatar answered Sep 18 '22 16:09

Iterator