Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas group by function applied to a column

In the Groupby documentation, I only see examples of grouping by functions applied to the index of axis 0 or to the labels of the columns. I see no examples discussing how to group by a label derived from applying a function to a column. I would think this would be done using apply. Is the example below the best way to do this?

df = pd.DataFrame({'name' : np.random.choice(['a','b','c','d','e'], 20), 
               'num1': np.random.randint(low = 30, high=100, size=20),
               'num2': np.random.randint(low = -3, high=9, size=20)})

df.head()

  name  num1 num2
0   d   34  7
1   b   49  6
2   a   51  -1
3   d   79  8
4   e   72  5

def num1_greater_than_60(number_num1):
    if number_num1 >= 60:
        return 'greater'
    else:
        return 'less'

df.groupby(df['num1'].apply(num1_greater_than_60))
like image 954
dleal Avatar asked Mar 20 '18 16:03

dleal


People also ask

How do I group values in a column in pandas?

Group by and value_counts Groupby is a very powerful pandas method. You can group by one column and count the values of another column per this column value using value_counts. Using groupby and value_counts we can count the number of activities each person did.

How do I apply a function to a column in pandas?

4. Pandas Apply Function to All Columns. In some cases we would want to apply a function on all pandas columns, you can do this using apply() function. Here the add_3() function will be applied to all DataFrame columns.

What is possible using Groupby () method of pandas?

Pandas groupby is used for grouping the data according to the categories and apply a function to the categories. It also helps to aggregate data efficiently. Pandas dataframe. groupby() function is used to split the data into groups based on some criteria.


1 Answers

from DataFrame.groupby() docs:

by : mapping, function, str, or iterable
    Used to determine the groups for the groupby.
    If ``by`` is a function, it's called on each value of the object's
    index. If a dict or Series is passed, the Series or dict VALUES
    will be used to determine the groups (the Series' values are first
    aligned; see ``.align()`` method). If an ndarray is passed, the
    values are used as-is determine the groups. A str or list of strs
    may be passed to group by the columns in ``self``

so we can do it this way:

In [35]: df.set_index('num1').groupby(num1_greater_than_60)[['name']].count()
Out[35]:
         name
greater    15
less        5
like image 111
MaxU - stop WAR against UA Avatar answered Oct 13 '22 00:10

MaxU - stop WAR against UA