Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NLTK MEGAM Max Ent algorithms on Windows

I have been playing with NLTK on Python but unable to use the MEGAM Max Ent algorithm due to the lack of a Windows 64-bit executable of any version of the MEGAM library equal or above 0.3 (needs to include the -nobias option for NLTK to work, which was introduced in v. 0.3).

http://www.cs.utah.edu/~hal/megam/

The author recommends compiling your own executable, although getting O'Caml to work on Win64 is just another nightmare.

Does anyone out there have a Windows compiled version of the MEGAM executable that is either version 0.4 or above? I would be eternally grateful!

like image 913
ToOsIK Avatar asked Oct 07 '22 12:10

ToOsIK


1 Answers

I was able to get Python NLTK MegaM library to work after a bit of work on Windows 7, the solution is fairly straightforward (in hindsight). My methodology is described below in detail and the links are included. I hope you find them useful.

High level:

  1. Install OCaml Compiler (Special version: OCamlPro)
  2. Download the Source Code for MagaM
  3. Download and install GNU32Make utility
  4. Edit the MegaM MakeFile in 2 places
  5. Run Gnu32Make to Generate magam.exe file
  6. Programmatically indicate the location of the megam.exe file to Python NLTK
  7. Run the nltk.MaxentClassifier.train command

Links:

  1. MEGAM SITE:
  2. Windows OCamlPro Download
  3. GNU32 Make for Windows

The Gory Details

There are some peculiarities of this process that can easily go south given the lack of documentation - I'd like to call attention to a few I found...

Windows OCamPro

It's very important to get the OCaml Pro version for Windows that's a self contained entity w/out dependencies to anything else. The version I have listed is just that, it'll install into a single dir of your choice. It is very important to add the path to the bin directory to the system path of windows.

MEGAM

Windows is a challenge for this library because it's had some SNAFU's with the developer, so you are forced to download the source and compile it on your own. This isn't as difficult as it first appears. As a general process it's fairly straightforward to unarchive a .Targz file into a dir and unarchive it 2X to get to the source dir. The most important 2 goals to achieve is to (a) properly edit the Makefile and (b) add the path of the directory that contains the resulting megam.exe file to the windows system path.

GNU32Win

This is a straight forward process, just make sure to add the path of the Gnu32Make exe file to your windows system path after install.

MEGAM MakeFile

In the directory where you unarchived the MagaM files, there will exist a MakeFile in which there are 2 line where you must get the editing right to ensure a proper build.

First: (swap out the bold flags in the commented out line with the one in the uncommented line)

  • WITHSTR =str.cma -cclib -lstr
  • WITHSTR =str.cma -cclib -lcamlstr

Second: (Swap out the path of the first line with the equivalent path on your system)

NOTE: That path must point to the "\lib\caml" directory of your OcamlPro installation on your system.

  • WITHCLIBS =-I /usr/lib/ocaml/3.09.2/caml
  • WITHCLIBS =-I E:\OCamlPro\OCPWin64\lib\caml

Run make in the megam dir

At this stage, you should be able to just open a windows CMD shell, cd into the directory where you modified the makefile and just run make to compile and generate the executable file megam.exe.

You should see output similar to:

make ocamldep *.mli *.ml > .depend ocamlc -g -custom -o megam str.cma -cclib -lcamlstr bigarray.cma -cclib -lbigarray unix.cma -cclib -lunix -I E:\OCamlPro\OCPWin64\lib\caml fastdot_c.c fastdot.cmo intHashtbl.cmo arry.cmo util.cmo data.cmo bitvec.cmo cg.cmo wsemlm.cmo bfgs.cmo pa.cmo perceptron.cmo radapt.cmo kernelmap.cmo abffs.cmo main.cmo

Programatically Indicate the Location of the Megam.exe file to Pythons NLTK

The last gotcha I ran into was how to precisely indicate to Pythonn NLTK the location of my magam.exe file. In the calling code, I placed the statement indicating such just before the line where I called the MaxentClassifier itself, and that worked just fine, see below.

Note: It took a LONG time on my development workstation so be patient.

 nltk.config_megam('E:\megam\megam.exe')
 self.classifier = nltk.MaxentClassifier.train(train_set, algorithm='megam', trace=0)
like image 95
ProfVersaggi Avatar answered Oct 10 '22 03:10

ProfVersaggi