Quite a basic question, apologies if its been asked before but couldnt find the answer.
Trying to filter a dataset based on gender so that I can see the girl-boy sales split, but the data is done by title i.e. Mr, Mrs, Miss & Ms.
I have for men:
men = cd.loc[cd.title_desc == "MR", "SALES"]
For women I want MR, MRS & MISS included i.e.
women = cd.loc[cd.title_desc == "MRS" and "MISS" and "MS", "SALES"]
but obviously the "and" isn't correct.
Help appreciated!
This has definitely been asked before, but here you go.
To create two different Series objects by filtering on multiple values:
men = cd.loc[cd.title_desc == 'MR','SALES']
women = cd.loc[cd.title_desc.isin(['MRS','MISS','MS']), 'SALES']
Alternatively, if you want to go straight to total sales by gender:
cd['gender'] = ''
cd.loc[cd.title_desc == 'MR', 'gender'] = 'men'
cd.loc[cd.title_desc.isin(['MRS','MISS','MS']), 'gender'] = 'women'
cd.groupby('gender').agg({'SALES': sum})
You have to break it up into multiple logical statements, which you can then combine with the logical or operator '|'. The resulting boolean vector can be used with .loc
bvec = (cd.title_desc == "MRS") | (cd.title_desc == "MISS") | (cd.title_desc == "MS")
women = cd.loc[bvec,"SALES"]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With