I have a pandas dataframe:
df
0 PL
1 PL
2 PL
3 IT
4 IT
..
4670 DE
4671 NO
4672 MT
4673 FI
4674 XX
Name: country_code, Length: 4675, dtype: object
I am filtering this by germany country tag 'DE' via:
df = df[df.apply(lambda x: 'DE' in x)]
If I would like to filter with more countries than I have to add them manually via: .apply(lambda x: 'DE' in x or 'GB' in x)
. However I would like to create a countries list and generate this statement automaticly.
Something like this:
countries = ['DE', 'GB', 'IT']
df = df[df.apply(lambda x: any_item_in_countries_list in x)]
I think I can filter df 3 times and then merge these pieces back via concat()
, however is there a more generic function to achieve this?
You can use .isin()
:
df[df['country_code'].isin(['DE', 'GB', 'IT'])]
Performance comparison:
import timeit
import pandas as pd
df = pd.DataFrame({'country_code': ['DE', 'GB', 'IT', 'MT', 'FI', 'XX'] * 1000})
%timeit df[df['country_code'].isin(['DE', 'GB', 'IT'])]
409 µs ± 19 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df['country_code'].apply(lambda x: x in ['DE', 'AT', 'GB'])
1.35 ms ± 474 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
If you have column names the you can try this
countries = ['DE', 'GB', 'IT']
df[df['country_code'].isin(countries)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With