Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering a pandas df with any of the list values [duplicate]

I have a pandas dataframe:

df
0       PL
1       PL
2       PL
3       IT
4       IT
        ..
4670    DE
4671    NO
4672    MT
4673    FI
4674    XX
Name: country_code, Length: 4675, dtype: object

I am filtering this by germany country tag 'DE' via:

df = df[df.apply(lambda x: 'DE' in x)]

If I would like to filter with more countries than I have to add them manually via: .apply(lambda x: 'DE' in x or 'GB' in x). However I would like to create a countries list and generate this statement automaticly.

Something like this:

countries = ['DE', 'GB', 'IT']
df = df[df.apply(lambda x: any_item_in_countries_list in x)]

I think I can filter df 3 times and then merge these pieces back via concat(), however is there a more generic function to achieve this?

like image 876
oakca Avatar asked Oct 15 '25 09:10

oakca


2 Answers

You can use .isin():

df[df['country_code'].isin(['DE', 'GB', 'IT'])]

Performance comparison:

import timeit
import pandas as pd
df = pd.DataFrame({'country_code': ['DE', 'GB', 'IT', 'MT', 'FI', 'XX'] * 1000})

%timeit df[df['country_code'].isin(['DE', 'GB', 'IT'])]
409 µs ± 19 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df['country_code'].apply(lambda x: x in ['DE', 'AT', 'GB'])
1.35 ms ± 474 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
like image 184
Andreas Avatar answered Oct 16 '25 23:10

Andreas


If you have column names the you can try this

countries = ['DE', 'GB', 'IT']
df[df['country_code'].isin(countries)]
like image 41
Sabil Avatar answered Oct 16 '25 22:10

Sabil



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!