I have a dataframe where I want to group by the first part of an ID field. For example, say I have the following: <pre class="prettyprint"><code>>>> import pandas as pd >>> df=pd.DataFrame(data=[['AA',1],['AB',4],['AC',5],['BA',11],['BB',2],['CA',9]], columns=['ID','Value']) >>> df ID Value 0 AA 1 1 AB 4 2 AC 5 3 BA 11 4 BB 2 5 CA 9 >>> </code></pre> How can I group by the first letter of the ID field? I can currently do this by creating a new column and then grouping on that, but I imagine there is a more efficient way: <pre class="prettyprint"><code>>>> df['GID']=df['ID'].str[:1] >>> df.groupby('GID')['Value'].sum() GID A 10 B 13 C 9 Name: Value, dtype: int64 >>> </code></pre>

You will need to create a grouping key somehow, just not necessarily on the DataFrame itself, for eg: <pre class="prettyprint"><code>df.groupby(df.ID.str[:1])['Value'].sum() </code></pre>

Pandas groupby slice of a string

Tags:

python

pandas

dataframe

I have a dataframe where I want to group by the first part of an ID field. For example, say I have the following:

>>> import pandas as pd
>>> df=pd.DataFrame(data=[['AA',1],['AB',4],['AC',5],['BA',11],['BB',2],['CA',9]], columns=['ID','Value'])
>>> df
   ID  Value
0  AA      1
1  AB      4
2  AC      5
3  BA     11
4  BB      2
5  CA      9
>>>

How can I group by the first letter of the ID field?

I can currently do this by creating a new column and then grouping on that, but I imagine there is a more efficient way:

>>> df['GID']=df['ID'].str[:1]
>>> df.groupby('GID')['Value'].sum()
GID
A    10
B    13
C     9
Name: Value, dtype: int64
>>>

690

asked Dec 30 '15 18:12

AJG519

1 Answers

You will need to create a grouping key somehow, just not necessarily on the DataFrame itself, for eg:

df.groupby(df.ID.str[:1])['Value'].sum()

answered Sep 30 '22 18:09

Jon Clements

Related questions
                            
                                Django Error - Reverse for 'password_reset_confirm' with arguments '()' and keyword arguments '
                            
                                Is there a simple way to get rid of junk values that come when you SSH using Python's Paramiko library and fetch output from CLI of a remote machine?
                            
                                Python requests.post multipart/form-data [duplicate]
                            
                                Iterative solving of sparse systems of linear equations with (M, N) right-hand size matrix
                            
                                Django template: Embed css from file
                            
                                How can I obtain the same 'special' solutions to underdetermined linear systems that Matlab's `A \ b` (mldivide) operator returns using numpy/scipy?
                            
                                Lists are the same but not considered equal?
                            
                                Overloading the [] operator in python class to refer to a numpy.array data member
                            
                                Spark using Python : save RDD output into text files
                            
                                Mutable default argument for a Python namedtuple
                            
                                Flask-Admin / Flask-SQLAlchemy: set user_id = current_user for INSERT
                            
                                MySQLdb raises "execute() first" error even though I execute before calling fetchall
                            
                                Where can the RDS_DB_NAME setting for an Elastic Beanstalk environment be changed
                            
                                Difference between local and dense layers in CNNs
                            
                                Can't reproduce distance value between sources obtained with astropy
                            
                                How to change request url before making request in scrapy?
                            
                                Installed Anaconda for python 2 and 3. Can't run 2
                            
                                Errno13, Permission denied when trying to read file
                            
                                How to scrape elements that immediately follows a certain element?
                            
                                Django Admin - remove permissions from the list on Add/Edit Group page