In the Groupby documentation, I only see examples of grouping by functions applied to the index of axis 0 or to the labels of the columns. I see no examples discussing how to group by a label derived from applying a function to a column. I would think this would be done using apply
. Is the example below the best way to do this?
df = pd.DataFrame({'name' : np.random.choice(['a','b','c','d','e'], 20),
'num1': np.random.randint(low = 30, high=100, size=20),
'num2': np.random.randint(low = -3, high=9, size=20)})
df.head()
name num1 num2
0 d 34 7
1 b 49 6
2 a 51 -1
3 d 79 8
4 e 72 5
def num1_greater_than_60(number_num1):
if number_num1 >= 60:
return 'greater'
else:
return 'less'
df.groupby(df['num1'].apply(num1_greater_than_60))
Group by and value_counts Groupby is a very powerful pandas method. You can group by one column and count the values of another column per this column value using value_counts. Using groupby and value_counts we can count the number of activities each person did.
4. Pandas Apply Function to All Columns. In some cases we would want to apply a function on all pandas columns, you can do this using apply() function. Here the add_3() function will be applied to all DataFrame columns.
Pandas groupby is used for grouping the data according to the categories and apply a function to the categories. It also helps to aggregate data efficiently. Pandas dataframe. groupby() function is used to split the data into groups based on some criteria.
from DataFrame.groupby() docs:
by : mapping, function, str, or iterable
Used to determine the groups for the groupby.
If ``by`` is a function, it's called on each value of the object's
index. If a dict or Series is passed, the Series or dict VALUES
will be used to determine the groups (the Series' values are first
aligned; see ``.align()`` method). If an ndarray is passed, the
values are used as-is determine the groups. A str or list of strs
may be passed to group by the columns in ``self``
so we can do it this way:
In [35]: df.set_index('num1').groupby(num1_greater_than_60)[['name']].count()
Out[35]:
name
greater 15
less 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With