Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract country from a string in python

Tags:

python

I am having some text which may or may not contain a country name in it. for example:

' Nigeria: Hotspot Network LTD Rural Telephony Feasibility Study'

this is how I extract the country name from it. in my first attempt:

findcountry("Nigeria: Hotspot Network LTD Rural Telephony Feasibility Study")

def findCountry(stringText):
    for country in pycountry.countries:
        if country.name.lower() in stringText.lower():
            return country.name
    return None

unfortunately, it gives me the wrong output as [Niger] whereas the correct one is Nigeria. Note Niger and Nigeria are two different existing countries in the world.

in second attempt:

def findCountry(stringText):
    full_list =[]
    for country in pycountry.countries:
        if country.name.lower() in stringText.lower():
            full_list.append(country)

    if len(full_list) > 0:
        return full_list

    return None

I get ['Niger', 'Nigeria'] as output. but I can't find a way to get Nigeria as my final output. How to achieve this.

Note: here I know Nigeria is the correct answer but later one I will put it to the code to choose the final country name if present in the text and it should be having very high accuracy for detection.

like image 456
Talib Daryabi Avatar asked May 31 '21 04:05

Talib Daryabi


People also ask

How do I extract a country from a string in Python?

Today, I'd like to show you how to extract the countries' names from a text using two ways. The first way: extracting without using any libraries. The second way: extracting by using the pycountry library.

How to extract a specific word from a string in Python?

We can extract a specific word from a string in python using index () method and string slicing as follows. We can use regular expressions in python to extract specific words from a string. We can use search () method from re module to find the first occurrence of the word and then we can obtain the word using slicing.

How do I get a list of countries from text?

Various functions can be used to get cities, countries, regions etc from the text. locationtagger.find_location (text) : Return the entity with location information. The “text” parameter takes text as input. entity.countries : Extracts all the countries in text.

How to extract only characters from any given string in Python?

In this tutorial, we are going to learn how to extract only characters from any given string in python. We will learn two different ways of doing so using the following two method: Get the input from the user using the input () method. Declare an empty string to store the alphabets. If the ASCII value of char is between 65 and 90 or 97 and 122.

How to extract specific words from a string using regular expressions?

We can use regular expressions in python to extract specific words from a string. We can use search () method from re module to find the first occurrence of the word and then we can obtain the word using slicing.


Video Answer


3 Answers

Always search for longest strings first; this will prevent the kind of error you encountered.

countries = sorted(pycountry.countries, key=lambda x: -len(x))
like image 113
Amadan Avatar answered Nov 10 '22 16:11

Amadan


One regex approach would be to build an alternation containing all target countries to be found. Then, use re.findall on the input text to find any possible matches:

regex = r'\b(?:' + '|'.join(pycountry.countries) + r')\b'

def findCountry(stringText):
    countries = re.findall(regex, stringText, flags=re.IGNORECASE)
    return countries
like image 36
Tim Biegeleisen Avatar answered Nov 10 '22 16:11

Tim Biegeleisen


The problem here is in works for occurrence. So Niger is true for Nigeria. You can also change the placement for variables before and after in but that will solve for Nigeria but not for others. You can use == which will solve all the case.

def findCountry(stringText):
    for country in pycountry.countries:
        if country.name.lower() == stringText.lower():
            return country.name
    return None
like image 30
moshfiqrony Avatar answered Nov 10 '22 15:11

moshfiqrony