I have a dataframe where I want to group by the first part of an ID field. For example, say I have the following:
>>> import pandas as pd
>>> df=pd.DataFrame(data=[['AA',1],['AB',4],['AC',5],['BA',11],['BB',2],['CA',9]], columns=['ID','Value'])
>>> df
ID Value
0 AA 1
1 AB 4
2 AC 5
3 BA 11
4 BB 2
5 CA 9
>>>
How can I group by the first letter of the ID field?
I can currently do this by creating a new column and then grouping on that, but I imagine there is a more efficient way:
>>> df['GID']=df['ID'].str[:1]
>>> df.groupby('GID')['Value'].sum()
GID
A 10
B 13
C 9
Name: Value, dtype: int64
>>>
Using “contains” to Find a Substring in a Pandas DataFrame The contains method in Pandas allows you to search a column for a specific substring. The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not.
Pandas str. slice() method is used to slice substrings from a string present in Pandas series object. It is very similar to Python's basic principal of slicing objects that works on [start:stop:step] which means it requires three parameters, where to start, where to end and how much elements to skip.
You will need to create a grouping key somehow, just not necessarily on the DataFrame itself, for eg:
df.groupby(df.ID.str[:1])['Value'].sum()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With