Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas group by custom function

This should be really simple. What I want is the ability to group by the result of a function, just like in SQL you can group by an expresion:

SELECT substr(name, 1) as letter, COUNT(*) as count
FROM table
GROUP BY substr(name, 1)

This would count the number of rows where the name column starts with each letter of the alphabet.

I want to do the same in python, so I assumed I could pass in a function to groupby. However this only passes in the index column (the first column) to the function, for example 0, 1 or 2. What I want is the name column:

import pandas

# Return the first letter
def first_letter(row):

    # row is 0, then 1, then 2 etc.
    return row.name[0]

#Generate a data set of words
test = pandas.DataFrame({'name': ["benevolent", "hidden", "absurdity", "anonymous", "furious", "antidemocratic", "honeydew"]})

#              name
# 0      benevolent
# 1          hidden
# 2       absurdity
# 3       anonymous
# 4         furious
# 5  antidemocratic
# 6        honeydew

test.groupby(first_letter)

What am I doing wrong here. How can a group by something other than the row index?

like image 770
Migwell Avatar asked Dec 19 '22 21:12

Migwell


1 Answers

Create a new column for the first letter:

def first_letter(row):
    return row[0]

test['first'] = test['name'].apply(first_letter)

and group it:

group = test.groupby('first')

use it:

>>> group.count()

     name
first      
a         3
b         1
f         1
h         2
like image 182
Mike Müller Avatar answered Dec 21 '22 09:12

Mike Müller