Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to smartly match two data frames using Python (using pandas or other means)?

I have one pandas dataframe composed of the names of the world's cities as well as countries, to which cities belong,

city.head(3)

    city    country
0   Qal eh-ye Now   Afghanistan
1   Chaghcharan Afghanistan
2   Lashkar Gah Afghanistan

and another data frame consisting of addresses of the world's universities, which is shown below:

df.head(3)
    university
0   Inst Huizhou, Huihzhou 516001, Guangdong, Peop...
1   Guangxi Acad Sci, Nanning 530004, Guangxi, Peo...
2   Shenzhen VisuCA Key Lab SIAT, Shenzhen, People...

The locations of cities' names are irregularly distributed across rows. I would like to match the city names with the addresses of world's universities. That is, I would like to know which city each university is located in. Hopefully, the city name matched is shown in the same row as the address of each university.

I've tried the following, and it doesn't work because the locations of cities are irregular across the rows.

df['university'].str.split(',').str[0]
like image 491
fffchao Avatar asked Mar 11 '23 05:03

fffchao


1 Answers

I would suggest to use apply

city_list = city.tolist()

def match_city(row):
    for city in city_list:
        if city in row['university']: return city
    return 'None'

df['city'] = df.apply(match_city, axis=1)

I assume the addresses of university data is clean enough. If you want to do more advanced checking of matching, you can adjust the match_city function.

like image 141
Redho Ayassa Avatar answered Mar 13 '23 19:03

Redho Ayassa