Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract countries from a text?

I use Python 3 (I also have Python 2 installed) and I want to extract countries or cities from a short text. For example, text = "I live in Spain" or text = "United States (New York), United Kingdom (London)".

The answer for countries:

  1. Spain
  2. [United States, United Kingdom]

I tried to install geography but I am unable to run pip install geography. I get this error:

Collecting geography Could not find a version that satisfies the requirement geography (from versions: ) No matching distribution found for geography

It looks like geography only works with Python 2.

I also have geopandas, but I don't know how to extract the required info from text using geopandas.

like image 236
Markus Avatar asked Feb 04 '18 10:02

Markus


People also ask

How can I extract countries and cities from text?

Extract and count countries and cities (+their synonyms) from text, like GeoText on steroids using FlashText, a Aho-Corasick implementation. Flashgeotext is a fast, batteries-included (and BYOD) and native python library that extracts one or more sets of given city and country names (+ synonyms) from an input text.

How do I extract City and country names in my pipeline?

Flashgeotext can help you to extract city and country names in your text processing pipelines. It gives you the ability to add your data instead of or on top of the demo data provided.

How does Geotext match cities and countries?

Afterward, GeoText tries to match every single one of the entities found to a collection of city and country names one by one. This approach is fast for the 22.000 cities that come with the library, but do not scale well with longer texts and more cities/keywords in a lookup file.

Is there a way to do geolocation of texts?

There is a newer project called TEXTGROUNDER doing something called “Document Geolocation”, but while the code is available it is not set up be run on your own input texts. I only recommend you look at it if you are itching to either start or contribute to a project trying to do something like this.


2 Answers

you could use pycountry for your task (it also works with python 3):

pip install pycountry

import pycountry
text = "United States (New York), United Kingdom (London)"
for country in pycountry.countries:
    if country.name in text:
        print(country.name)
like image 120
matyas Avatar answered Sep 29 '22 11:09

matyas


There is a newer version for this library that supports python3 named geograpy3

pip install geograpy3

It allows you to extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.

Example:

import geograpy
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
url = 'http://www.bbc.com/news/world-europe-26919928'
places = geograpy.get_place_context(url=url)

You can find more details under this link:

like image 25
Jendoubi Zaid Avatar answered Sep 29 '22 11:09

Jendoubi Zaid