I use Python 3 (I also have Python 2 installed) and I want to extract countries or cities from a short text.
For example, text = "I live in Spain"
or text = "United States (New York), United Kingdom (London)"
.
The answer for countries:
I tried to install geography
but I am unable to run pip install geography
. I get this error:
Collecting geography Could not find a version that satisfies the requirement geography (from versions: ) No matching distribution found for geography
It looks like geography
only works with Python 2.
I also have geopandas
, but I don't know how to extract the required info from text using geopandas.
Extract and count countries and cities (+their synonyms) from text, like GeoText on steroids using FlashText, a Aho-Corasick implementation. Flashgeotext is a fast, batteries-included (and BYOD) and native python library that extracts one or more sets of given city and country names (+ synonyms) from an input text.
Flashgeotext can help you to extract city and country names in your text processing pipelines. It gives you the ability to add your data instead of or on top of the demo data provided.
Afterward, GeoText tries to match every single one of the entities found to a collection of city and country names one by one. This approach is fast for the 22.000 cities that come with the library, but do not scale well with longer texts and more cities/keywords in a lookup file.
There is a newer project called TEXTGROUNDER doing something called “Document Geolocation”, but while the code is available it is not set up be run on your own input texts. I only recommend you look at it if you are itching to either start or contribute to a project trying to do something like this.
you could use pycountry for your task (it also works with python 3):
pip install pycountry
import pycountry
text = "United States (New York), United Kingdom (London)"
for country in pycountry.countries:
if country.name in text:
print(country.name)
There is a newer version for this library that supports python3 named geograpy3
pip install geograpy3
It allows you to extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.
Example:
import geograpy
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
url = 'http://www.bbc.com/news/world-europe-26919928'
places = geograpy.get_place_context(url=url)
You can find more details under this link:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With