This should be really simple. What I want is the ability to group by the result of a function, just like in SQL you can group by an expresion:
SELECT substr(name, 1) as letter, COUNT(*) as count
FROM table
GROUP BY substr(name, 1)
This would count the number of rows where the name column starts with each letter of the alphabet.
I want to do the same in python, so I assumed I could pass in a function to groupby. However this only passes in the index column (the first column) to the function, for example 0, 1 or 2. What I want is the name column:
import pandas
# Return the first letter
def first_letter(row):
# row is 0, then 1, then 2 etc.
return row.name[0]
#Generate a data set of words
test = pandas.DataFrame({'name': ["benevolent", "hidden", "absurdity", "anonymous", "furious", "antidemocratic", "honeydew"]})
# name
# 0 benevolent
# 1 hidden
# 2 absurdity
# 3 anonymous
# 4 furious
# 5 antidemocratic
# 6 honeydew
test.groupby(first_letter)
What am I doing wrong here. How can a group by something other than the row index?
Create a new column for the first letter:
def first_letter(row):
return row[0]
test['first'] = test['name'].apply(first_letter)
and group it:
group = test.groupby('first')
use it:
>>> group.count()
name
first
a 3
b 1
f 1
h 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With