Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby slice of a string

I have a dataframe where I want to group by the first part of an ID field. For example, say I have the following:

>>> import pandas as pd
>>> df=pd.DataFrame(data=[['AA',1],['AB',4],['AC',5],['BA',11],['BB',2],['CA',9]], columns=['ID','Value'])
>>> df
   ID  Value
0  AA      1
1  AB      4
2  AC      5
3  BA     11
4  BB      2
5  CA      9
>>> 

How can I group by the first letter of the ID field?

I can currently do this by creating a new column and then grouping on that, but I imagine there is a more efficient way:

>>> df['GID']=df['ID'].str[:1]
>>> df.groupby('GID')['Value'].sum()
GID
A    10
B    13
C     9
Name: Value, dtype: int64
>>> 
like image 690
AJG519 Avatar asked Dec 30 '15 18:12

AJG519


People also ask

How do I select a substring in pandas?

Using “contains” to Find a Substring in a Pandas DataFrame The contains method in Pandas allows you to search a column for a specific substring. The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not.

How do you slice series pandas?

Pandas str. slice() method is used to slice substrings from a string present in Pandas series object. It is very similar to Python's basic principal of slicing objects that works on [start:stop:step] which means it requires three parameters, where to start, where to end and how much elements to skip.


1 Answers

You will need to create a grouping key somehow, just not necessarily on the DataFrame itself, for eg:

df.groupby(df.ID.str[:1])['Value'].sum()
like image 54
Jon Clements Avatar answered Sep 30 '22 18:09

Jon Clements