I am having some text which may or may not contain a country name in it. for example:
' Nigeria: Hotspot Network LTD Rural Telephony Feasibility Study'
this is how I extract the country name from it. in my first attempt:
findcountry("Nigeria: Hotspot Network LTD Rural Telephony Feasibility Study")
def findCountry(stringText):
for country in pycountry.countries:
if country.name.lower() in stringText.lower():
return country.name
return None
unfortunately, it gives me the wrong output as [Niger]
whereas the correct one is Nigeria. Note Niger and Nigeria are two different existing countries in the world.
in second attempt:
def findCountry(stringText):
full_list =[]
for country in pycountry.countries:
if country.name.lower() in stringText.lower():
full_list.append(country)
if len(full_list) > 0:
return full_list
return None
I get ['Niger', 'Nigeria']
as output. but I can't find a way to get Nigeria as my final output. How to achieve this.
Note: here I know Nigeria is the correct answer but later one I will put it to the code to choose the final country name if present in the text and it should be having very high accuracy for detection.
Today, I'd like to show you how to extract the countries' names from a text using two ways. The first way: extracting without using any libraries. The second way: extracting by using the pycountry library.
We can extract a specific word from a string in python using index () method and string slicing as follows. We can use regular expressions in python to extract specific words from a string. We can use search () method from re module to find the first occurrence of the word and then we can obtain the word using slicing.
Various functions can be used to get cities, countries, regions etc from the text. locationtagger.find_location (text) : Return the entity with location information. The “text” parameter takes text as input. entity.countries : Extracts all the countries in text.
In this tutorial, we are going to learn how to extract only characters from any given string in python. We will learn two different ways of doing so using the following two method: Get the input from the user using the input () method. Declare an empty string to store the alphabets. If the ASCII value of char is between 65 and 90 or 97 and 122.
We can use regular expressions in python to extract specific words from a string. We can use search () method from re module to find the first occurrence of the word and then we can obtain the word using slicing.
Always search for longest strings first; this will prevent the kind of error you encountered.
countries = sorted(pycountry.countries, key=lambda x: -len(x))
One regex approach would be to build an alternation containing all target countries to be found. Then, use re.findall
on the input text to find any possible matches:
regex = r'\b(?:' + '|'.join(pycountry.countries) + r')\b'
def findCountry(stringText):
countries = re.findall(regex, stringText, flags=re.IGNORECASE)
return countries
The problem here is in works for occurrence. So Niger is true for Nigeria. You can also change the placement for variables before and after in but that will solve for Nigeria but not for others. You can use ==
which will solve all the case.
def findCountry(stringText):
for country in pycountry.countries:
if country.name.lower() == stringText.lower():
return country.name
return None
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With