I need to extract all the city names from a website. I've used beautifulSoup with RE in previous projects but on this website the city names are part of regular text and do not have a specific format. I found geograpy package (https://pypi.python.org/pypi/geograpy/0.3.7) that fulfills my requirements.
Geograpy uses nltk package. I installed all the models and packages for nltk but it keeps throwing this error:
>>> import geograpy
>>> places = geograpy.get_place_context(url="http://www.state.gov/misc/list/")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\geograpy\__init__.py", line 6, in get_place_context
e.find_entities()
File "C:\Python27\lib\site-packages\geograpy\extraction.py", line 31, in find_entities
if (ne.node == 'GPE' or ne.node == 'PERSON') and ne[0][1] == 'NNP':
File "C:\Python27\lib\site-packages\nltk\tree.py", line 198, in _get_node
raise NotImplementedError("Use label() to access a nod label.")
NotImplementedError: Use label() to access a node label.
Any help would be appreciated
You can solve this by replacing ".node" with ".label()".
In you problem, you can try replacing
if (ne.node == 'GPE' or ne.node == 'PERSON') and ne[0][1] == 'NNP':
with
if (ne.label() == 'GPE' or ne.label() == 'PERSON') and ne[0][1] == 'NNP':
Don't assume everyone modify lib files. For the guy or anyone that needs help, you will need to access where the package is installed. You want to modify the extraction.py. If you are using Windows 10 or something similar, the file can located in C:\Python27\Lib\site-packages\geograpy\extraction.py. It is usually in the same install directory as python. As someone else mention before, change (Line 31 )
if (ne.node == 'GPE' or ne.node == 'PERSON') and ne[0][1] == 'NNP':
to
if (ne.label() == 'GPE' or ne.label() == 'PERSON') and ne[0][1] == 'NNP':
Done. Happy Coding.
It looks like geograpy
is calling the node
method of an nltk
Tree
object:
nes = nltk.ne_chunk(nltk.pos_tag(text))
for ne in nes:
if len(ne) == 1:
if (ne.node == 'GPE' or ne.node == 'PERSON') and ne[0][1] == 'NNP':
which the nltk
package has marked as deprecated:
def _get_node(self):
"""Outdated method to access the node value; use the label() method instead."""
raise NotImplementedError("Use label() to access a node label.")
def _set_node(self, value):
"""Outdated method to set the node value; use the set_label() method instead."""
raise NotImplementedError("Use set_label() method to set a node label.")
node = property(_get_node, _set_node)
The package is broken. You can fix it yourself or use a different one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With