I have one pandas dataframe composed of the names of the world's cities as well as countries, to which cities belong,
city.head(3)
city country
0 Qal eh-ye Now Afghanistan
1 Chaghcharan Afghanistan
2 Lashkar Gah Afghanistan
and another data frame consisting of addresses of the world's universities, which is shown below:
df.head(3)
university
0 Inst Huizhou, Huihzhou 516001, Guangdong, Peop...
1 Guangxi Acad Sci, Nanning 530004, Guangxi, Peo...
2 Shenzhen VisuCA Key Lab SIAT, Shenzhen, People...
The locations of cities' names are irregularly distributed across rows. I would like to match the city names with the addresses of world's universities. That is, I would like to know which city each university is located in. Hopefully, the city name matched is shown in the same row as the address of each university.
I've tried the following, and it doesn't work because the locations of cities are irregular across the rows.
df['university'].str.split(',').str[0]
I would suggest to use apply
city_list = city.tolist()
def match_city(row):
for city in city_list:
if city in row['university']: return city
return 'None'
df['city'] = df.apply(match_city, axis=1)
I assume the addresses of university data is clean enough. If you want to do more advanced checking of matching, you can adjust the match_city
function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With