Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mahout : To read a custom input file

I was playing with Mahout and found that the FileDataModel accepts data in the format

     userId,itemId,pref(long,long,Double).

I have some data which is of the format

     String,long,double 

What is the best/easiest method to work with this dataset on Mahout?

like image 361
learner Avatar asked Aug 26 '11 19:08

learner


1 Answers

One way to do this is by creating an extension of FileDataModel. You'll need to override the readUserIDFromString(String value) method to use some kind of resolver do the conversion. You can use one of the implementations of IDMigrator, as Sean suggests.

For example, assuming you have an initialized MemoryIDMigrator, you could do this:

@Override
protected long readUserIDFromString(String stringID) {
    long result = memoryIDMigrator.toLongID(stringID); 
    memoryIDMigrator.storeMapping(result, stringID);
    return result;
}

This way you could use memoryIDMigrator to do the reverse mapping, too. If you don't need that, you can just hash it the way it's done in their implementation (it's in AbstractIDMigrator).

like image 63
Eyal Avatar answered Sep 18 '22 20:09

Eyal