This is a simple thing but I don't think it's been covered on SO or in the Pandas documentation.
The data
Two DataFrames containing data about cities. The DataFrame UK
contains a variable for UK city names. It has been grouped so there are no duplicate city names. This is a truncated example of this city
variable, (not the full DataFrame):
city
Hamilton
Edinburgh
Bury
...
The DataFrame US
contains a corresponding variable for US city names. It does have duplicate city names, but these are not actually duplicate values as there are numerous cities with the same name. Again, a truncated example:
city
Hamilton
Hamilton
Edinburgh
Edinburgh
Edinburgh
Bury
Bury
...
(Note: The variables aren't the same length in my data.)
The aim
Create a new variable in the UK
DataFrame which is an integer count for each time a city name in the US
city variable matches a city name in the UK
city variable.
Below shows an example of the output I'm aiming for. Say for one UK city called Hamilton there are 2 cities called Hamilton in the US. For one UK city called Edinburgh there are 3 matches in the US. And 2 matches for Bury.
city count
Hamilton 2
Edinburgh 3
Bury 2
Where I'm stuck
If I do this...
uk['count'] = uk['city'].isin(us['city']).astype(int)
the new variable is a binary 1 or 0 indicating that there is a match, which is halfway there. However I'm struggling with the Pandas syntax to return the count of matches. I've tried appending value_counts
and variations of unique but these didn't work. Maybe I'm approaching this the wrong way.
Use Series.value_counts
to count the number of occurrences for each city in US['city']
, and then use Series.map
to apply those counts to corresponding values in UK['city']
:
In [40]: US['city'].value_counts()
Out[40]:
Edinburgh 3
Bury 2
Hamilton 2
Name: city, dtype: int64
In [41]: UK['count'] = UK['city'].map(US['city'].value_counts())
In [42]: UK
Out[42]:
city count
0 Hamilton 2
1 Edinburgh 3
2 Bury 2
If there is a city in UK['city']
which is not in US['city']
then the above will assign a NaN
value to the count column for that city.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With