I have a dataframe that I need to group, then subgroup. From the subgroups I need to return what the subgroup is as well as the unique values for a column.
df = pandas.DataFrame({'country': pandas.Series(['US', 'Canada', 'US', 'US']),
'gender': pandas.Series(['male', 'female', 'male', 'female']),
'industry': pandas.Series(['real estate', 'shipping', 'telecom', 'real estate']),
'income': pandas.Series([1, 2, 3, 4])})
def subgroup(g):
return g.groupby(['gender'])
s = df.groupby(['country']).apply(subgroup)
From s, how can I compute the uniques of "industry" as well as which "gender" it's grouped for?
--------------------------------------------
| US | male | [real estate, telecom] |
| |----------------------------------
| | female | [real estate] |
--------------------------------------------
| Canada | female | [shipping] |
--------------------------------------------
Pandas series aka columns has a unique() method that filters out only unique values from a column. The first output shows only unique FirstNames. We can extend this method using pandas concat() method and concat all the desired columns into 1 single column and then find the unique of the resultant column.
You can use the pandas unique() function to get the different unique values present in a column. It returns a numpy array of the unique values in the column.
you dont need to define that function, you can solve your problem with groupby() and unique() solely;
try:
df.groupby(['country','gender'])['industry'].unique()
output:
country gender
Canada female [shipping]
US female [real estate]
male [real estate, telecom]
Name: industry, dtype: object
hope it helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With