How to smartly match two data frames using Python (using pandas or other means)?

Question

I have one pandas dataframe composed of the names of the world's cities as well as countries, to which cities belong,

city.head(3)

    city    country
0   Qal eh-ye Now   Afghanistan
1   Chaghcharan Afghanistan
2   Lashkar Gah Afghanistan

and another data frame consisting of addresses of the world's universities, which is shown below:

df.head(3)
    university
0   Inst Huizhou, Huihzhou 516001, Guangdong, Peop...
1   Guangxi Acad Sci, Nanning 530004, Guangxi, Peo...
2   Shenzhen VisuCA Key Lab SIAT, Shenzhen, People...

The locations of cities' names are irregularly distributed across rows. I would like to match the city names with the addresses of world's universities. That is, I would like to know which city each university is located in. Hopefully, the city name matched is shown in the same row as the address of each university.

I've tried the following, and it doesn't work because the locations of cities are irregular across the rows.

df['university'].str.split(',').str[0]

Redho Ayassa · Accepted Answer

I would suggest to use apply

city_list = city.tolist()

def match_city(row):
    for city in city_list:
        if city in row['university']: return city
    return 'None'

df['city'] = df.apply(match_city, axis=1)

I assume the addresses of university data is clean enough. If you want to do more advanced checking of matching, you can adjust the match_city function.

How to smartly match two data frames using Python (using pandas or other means)?

Tags:

python

pandas

dataframe

match

fffchao

1 Answers

Redho Ayassa

Recent Activity

Donate For Us

How to smartly match two data frames using Python (using pandas or other means)?

Tags:

python

pandas

dataframe

match

fffchao

1 Answers

Redho Ayassa

Related questions

Recent Activity

Donate For Us