I have a dataframe that has two columns, user_id
and item_bought
. Here user_id
is the index of the dataframe. I want to group by both user_id
and item_bought
and get the item wise count for the user.
How do I do that?
How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.
groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. sort : Sort group keys.
Overview: Create a dataframe using an dictionary. Group by item_id and find the max value. enumerate over the grouped dataframe and use the key which is an numeric value to return the alpha index value. Create an result_df dataframe if you desire.
The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.
From version 0.20.1 it is simplier:
Strings passed to DataFrame.groupby() as the by parameter may now reference either column names or index level names
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second']) df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 3, 3], 'B': np.arange(8)}, index=index) print (df) A B first second bar one 1 0 two 1 1 baz one 1 2 two 1 3 foo one 2 4 two 2 5 qux one 3 6 two 3 7 print (df.groupby(['second', 'A']).sum()) B second A one 1 2 2 4 3 6 two 1 4 2 5 3 7
this should work:
>>> df = pd.DataFrame(np.random.randint(0,5,(6, 2)), columns=['col1','col2']) >>> df['ind1'] = list('AAABCC') >>> df['ind2'] = range(6) >>> df.set_index(['ind1','ind2'], inplace=True) >>> df col1 col2 ind1 ind2 A 0 3 2 1 2 0 2 2 3 B 3 2 4 C 4 3 1 5 0 0 >>> df.groupby([df.index.get_level_values(0),'col1']).count() col2 ind1 col1 A 2 2 3 1 B 2 1 C 0 1 3 1
I had the same problem using one of the columns from multiindex. with multiindex, you cannot use df.index.levels[0] since it has only distinct values from that particular index level and will be most likely of different size than whole dataframe...
check http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.get_level_values.html - get_level_values "Return vector of label values for requested level, equal to the length of the index"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With