Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Mahout in a Windows environment?

I am trying to use Mahout in an application running on Windows. I want to build clusters from a lucene index using k-means.

As soon as I have to create sequence files (creating vectors from a lucene index), I get a Hadoop-Exception, since Hadoop makes command line calls to programs unknown in a Windows environment (e.g. chmod). Running in Cygwin is not an option, since I want to be able to run the App from eclipse.

So my question is

  • is there a way to avoid having to create sequence files to retrieve my vectors from a lucene index?
  • or is there a way to create sequence files in a Windows environment?
  • like image 845
    user249210 Avatar asked Apr 29 '10 08:04

    user249210


    2 Answers

    The only way you can run Hadoop on a Windows environment is to install Cygwin. For more info, see this blog post:

    http://hayesdavis.net/2008/06/14/running-hadoop-on-windows/

    Cygwin will provide all the command-line utilities (like chmod) that Hadoop relies on. You can still run your Hadoop jobs from within Eclipse if you want.

    like image 91
    bajafresh4life Avatar answered Sep 21 '22 19:09

    bajafresh4life


    Do you know the SequenceFile API? Have a look here: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.html You can try to write/read the data by yourself.

    I think you can run Mahout from eclipse in Windowns in stand-alone mode. But you will appear several short comings and barriers. You should try how far you come.

    In my opinion you shouldn't insist on running mahout from eclipse. ;-)

    like image 38
    Peter Wippermann Avatar answered Sep 23 '22 19:09

    Peter Wippermann