I have a dataframe as shown below.
df:
ID tag
1 pandas
2 numpy
3 matplotlib
4 pandas
5 pandas
6 sns
7 sklearn
8 sklearn
9 pandas
10 pandas
to the above df, I would like to add a column named tag_binary. Which will whether it is pandas or not.
Expected output:
ID tag tag_binary
1 pandas pandas
2 numpy non_pandas
3 matplotlib non_pandas
4 pandas pandas
5 pandas pandas
6 sns non_pandas
7 sklearn non_pandas
8 sklearn non_pandas
9 pandas pandas
10 pandas pandas
I tried the below code using a dictionary and map function. It worked fine. But I am wondering is there any easier way without creating this complete dictionary.
d = {'pandas':'pandas', 'numpy':'non_pandas', 'matplotlib':'non_pandas',
'sns':'non_pandas', 'sklearn':'non_pandas'}
df["tag_binary"] = df['tag'].map(d)
You can use where with an equality check to keep 'pandas' and fill everything else with 'non_pandas'.
df['tag_binary'] = df['tag'].where(df['tag'].eq('pandas'), 'non_pandas')
ID tag tag_binary
0 1 pandas pandas
1 2 numpy non_pandas
2 3 matplotlib non_pandas
3 4 pandas pandas
4 5 pandas pandas
5 6 sns non_pandas
6 7 sklearn non_pandas
7 8 sklearn non_pandas
8 9 pandas pandas
9 10 pandas pandas
If you want something a little more flexible, so you can also map specific values to some label, then you can leverage the fact that for keys not in your dict, map returns NaN. So only specify mappings you care about and then fillna to deal with every other case.
# Could be more general like {'pandas': 'pandas', 'geopandas': 'pandas'}
d = {'pandas': 'pandas'}
df['pandas_binary'] = df['tag'].map(d).fillna('non_pandas')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With