Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use the link grammar parser as a grammar checker

Abiword uses the link grammar parser as a simple grammar checker. I'd like to duplicate this feature with Python.

Poorly documented Python bindings exist, but I don't know how to use them to mimic the grammar checker in Abiword.

(I'm not interested in the actual parsing results. I only need to know if a sentence parses OK with the link grammar parser and if not which words can't be linked.)

What would be the best method to achieve this?

like image 680
Nemo XXX Avatar asked Jul 23 '16 19:07

Nemo XXX


1 Answers

I can't help you to mimic the grammar-checking abilities of AbiWord using Python bindings, but I can at least help you to build it and check out its functionalities.

Building with MS Visual Studio (32-bit architecture)

I'd normally say that "the best method to achieve this" is to build the Link Grammar library and Python bindings on a Linux machine following the extensive instructions in their readme file. However, judging by your comment above, Linux may not be an option, and it seems you want to stick to using Visual Studio over using e.g. Cygwin.

Dependencies

Regex

As stated in the readme, the Link Grammar library depends on some form of POSIX-compliant regex library — on Linux, this is baked-in. However, in Windows, you get to (or rather have to) choose an implementation of the library to use. Luckily, version 2.7 of the port provided by GnuWin played nicely with the Visual Studio solution/project files provided by Link Grammar 5.3.11 (found under %LINK_GRAMMAR%\msvc14).

However, you have to ensure that the Visual Studio build macro GNUREGEX_DIR points to the directory you unpacked the regex library to (e.g. D:\Program Files (x86)\GnuWin32). Note, however, that these build macros are not the same as Windows environment variables: Despite setting an environment variable under Windows 10 called GNUREGEX_DIR, Visual Studio did not make use of this variable until I changed the definition of the build macros in the Link Grammar project files, namely, in %LINK_GRAMMAR%\msvc14\Local.props the line:

<GNUREGEX_DIR>$(HOMEDRIVE)$(HOMEPATH)\Libraries\gnuregex</GNUREGEX_DIR>

to

<GNUREGEX_DIR>$(GNUREGEX_DIR)</GNUREGEX_DIR>

SWIG

In order to create Python bindings, you need to have SWIG on your system. However, in order for the build defined by the Visual Studio project Python2.vcxproj to find the SWIG executable, you need to add the respective directory to the Windows path, e.g. D:\Program Files (x86)\swigwin-3.0.10.

Just as with the regex library, you need to configure the VS project in order to be able to locate your Python directory, e.g. change <PYTHON2>C:\Python27</PYTHON2> in Local.props to <PYTHON2>$(PYTHON2)</PYTHON2> if you have a corresponding environment variable set.

Building

Once all the above libraries can be found by Visual Studio, the build process is pretty painless: Just build the project Python2, and if you have the VS solution file open (LinkGrammar.sln), it should automatically build the projects LinkGrammar and LinkGrammarExe, which it depends on.

Resolving shared libraries

After building the executable, you still need to ensure that the regex shared library (DLL) can be found: In order to do this, the the directory containing the required library (in this case, regex2.dll) should be on your path. It is probably easiest to add the directory to your global path, e.g. %GNUREGEX_DIR%\bin" in the case of using the GnuWin library mentioned above with the environment variable GNUREGEX_DIR pointing to it.

Running with Python

Now that you have tested that Windows executable does run and the Python bindings have been built, you can then import them into a Python script. In order to ensure they are correctly imported and SWIG has correctly located the appropriate DLLs, the Link Grammar readme mentions running the executable script make-check.py to load and run your script using Link Grammar:

make-check [PYTHON_FLAG] PYTHON_OUTDIR [script.py] [ARGUMENTS]

where OUTDIR is the directory to which your Python bindings were written, e.g. Win32\Debug\Python2. Unfortunately, however, despite that this file is mentioned in the readme for version 5.3.11, it is, in fact, not present in the "stable" version 5.3.11 distributable — despite that there is a version of it in the GitHub master repository. You can, however, simply get that one file from the Git repository and then use it in the msvc14 directory of your 5.3.11 distributable. As stated above, however, this script requires that regex2.dll be on the Windows path: If it hasn't been added to the global path, you will have to add it to the path accessible to the Python executable when running the script.

C API vs. Python API

I haven't used the Link Grammar parser much myself and so can't help you there, but you can still get an idea how to use them by looking at the C code for the project LinkGrammarExe. You can start by looking at the main function in link-parser\link-parser.c:

sent = sentence_create(input_string, dict);

...

num_linkages = sentence_parse(sent, opts);

In the simple CLI program built by the VS project, it simply checks num_linkages and, if the value thereof is 0, it displays No complete linkages found, which a user can interpret as meaning that the sentence is ungrammatical. This behavior can of course be tweaked to accept lower-scoring parses, find the word(s) which don't fit, etc., and so you can explore the functionalities using the C API first. Later, if you really want to use the Python bindings, the Python methods are named similarly to their C counterparts — see the file clinkgrammar.py:

def sentence_parse(sent, opts):
    return _clinkgrammar.sentence_parse(sent, opts)
    sentence_parse = _clinkgrammar.sentence_parse
like image 117
errantlinguist Avatar answered Oct 20 '22 19:10

errantlinguist