Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NLTK was unable to find the gs file

Tags:

python

nlp

nltk

I'm trying to use NLTK, the stanford natural language toolkit. After install the required files, I start to execute the demo code: http://www.nltk.org/index.html

>>> import nltk

>>> sentence = """At eight o'clock on Thursday morning
... Arthur didn't feel very good."""

>>> tokens = nltk.word_tokenize(sentence)

>>> tokens

['At', 'eight', "o'clock", 'on', 'Thursday', 'morning',

'Arthur', 'did', "n't", 'feel', 'very', 'good', '.']

>>> tagged = nltk.pos_tag(tokens)

>>> tagged[0:6]

[('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'), ('on', 'IN'),

('Thursday', 'NNP'), ('morning', 'NN')]

>>> entities = nltk.chunk.ne_chunk(tagged)

>>> entities

Then I get message:

LookupError: 

===========================================================================
NLTK was unable to find the gs file!
Use software specific configuration paramaters or set the PATH environment variable.

I tried google, but there's no one tell what the missing gs file is.

like image 242
Jie Hu Avatar asked Apr 29 '16 15:04

Jie Hu


5 Answers

I came across this error too.

gs stands for ghostscript. You get the error because your chunker is trying to use ghostscript to draw a parse tree of the sentence, something like this:

enter image description here

I was using IPython; to debug the issue I set the traceback verbosity to verbose with the command %xmode verbose, which prints the local variables of each stack frame. (see the full traceback below) The file names are:

file_names=['gs', 'gswin32c.exe', 'gswin64c.exe']

A little Google search for gswin32c.exe told me it was ghostscript.

/Users/jasonwirth/anaconda/lib/python3.4/site-packages/nltk/__init__.py in find_file_iter(filename='gs', env_vars=['PATH'], searchpath=(), file_names=['gs', 'gswin32c.exe', 'gswin64c.exe'], url=None, verbose=False)
    517                         (filename, url))
    518         div = '='*75
--> 519         raise LookupError('\n\n%s\n%s\n%s' % (div, msg, div))
    520 
    521 def find_file(filename, env_vars=(), searchpath=(),

LookupError: 

===========================================================================
NLTK was unable to find the gs file!
Use software specific configuration paramaters or set the PATH environment variable.
===========================================================================
like image 128
Jason Wirth Avatar answered Nov 17 '22 09:11

Jason Wirth


Just to add to the previous answers, if you replace 'entities' with 'print(entities)' you won't get the error.

Without print() the console/notebook doesn't know how to "draw" a tree object.

like image 32
Axle Max Avatar answered Nov 17 '22 09:11

Axle Max


A bit addition to Jason Wirth's answer. Under Windows, this line of code will search for "gswin64c.exe" in the environment variable PATH, however, the ghostscript installer does not add the binary to PATH, so for this to work, you'll need to find where ghostscript is installed and add the /bin subfolder to PATH.

For example, in my case I added C:\Program Files\gs\gs9.19\bin to PATH.

like image 5
Shuyang Sheng Avatar answered Nov 17 '22 10:11

Shuyang Sheng


If ghostscript for some reason is not available for your platform or fails to install you can also use the wonderful networkx package to visualize such trees:

import networkx as nx
from networkx.drawing.nx_agraph import graphviz_layout
import matplotlib.pyplot as plt

def drawNodes(G,nodeLabels,parent,lvl=0):
    def addNode(G,nodeLabels,label):
        n = G.number_of_nodes()
        G.add_node(n)
        nodeLabels[n] = label
        return n
    def findNode(nodeLabels,label):
        # Travel backwards from end to find right parent
        for i in reversed(range(len(nodeLabels))):
            if nodeLabels[i] == label:
                return i

    indent = " "*lvl
    if lvl == 0:
        addNode(G,nodeLabels,parent.label())
    for node in parent:
        if type(node) == nltk.Tree:
            n = addNode(G,nodeLabels,node.label())
            G.add_edge(findNode(nodeLabels,parent.label()),n)
            drawNodes(G,nodeLabels,node,lvl+1)
        else:
            print node
            n1 = addNode(G,nodeLabels,node[1])
            n0 = addNode(G,nodeLabels,node[0])
            G.add_edge(findNode(nodeLabels,parent.label()),n1)
            G.add_edge(n0,n1)

G = nx.Graph()
nodeLabels = {}
drawNodes(G,nodeLabels,entities)
options = {
    'node_color': 'white',
    'node_size': 100
 }
plt.figure(1,figsize=(12,6))
pos=graphviz_layout(G, prog='dot')
nx.draw(G, pos, font_weight='bold', arrows=False, **options)
l = nx.draw_networkx_labels(G,pos,nodeLabels) 

NLTK Token Tree plotted with NetworkX

like image 1
amagard Avatar answered Nov 17 '22 08:11

amagard


Instead of entities write entities.draw() It should work.

like image 1
Kaiwalya Patil Avatar answered Nov 17 '22 08:11

Kaiwalya Patil