This should be really simple. What I want is the ability to group by the result of a function, just like in SQL you can group by an expresion:
SELECT substr(name, 1) as letter, COUNT(*) as count
FROM table
GROUP BY substr(name, 1)
This would count the number of rows where the name column starts with each letter of the alphabet.
I want to do the same in python, so I assumed I could pass in a function to groupby. However this only passes in the index column (the first column) to the function, for example 0, 1 or 2. What I want is the name column:
import pandas
# Return the first letter
def first_letter(row):
    # row is 0, then 1, then 2 etc.
    return row.name[0]
#Generate a data set of words
test = pandas.DataFrame({'name': ["benevolent", "hidden", "absurdity", "anonymous", "furious", "antidemocratic", "honeydew"]})
#              name
# 0      benevolent
# 1          hidden
# 2       absurdity
# 3       anonymous
# 4         furious
# 5  antidemocratic
# 6        honeydew
test.groupby(first_letter)
What am I doing wrong here. How can a group by something other than the row index?
Create a new column for the first letter:
def first_letter(row):
    return row[0]
test['first'] = test['name'].apply(first_letter)
and group it:
group = test.groupby('first')
use it:
>>> group.count()
     name
first      
a         3
b         1
f         1
h         2
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With