I'm trying to build a new dataframe from grouping part of the string into a column. <pre class="prettyprint"><code>import pandas df = pandas.DataFrame([{'A': 'string_300_bla1', 'B': "Hi", 'C': 3}, {'A': 'string_300_blaa2', 'B': "Hello", 'C': 4}, {'A': 'string_487_blaaa1', 'B': "nice", 'C': 9}, {'A': 'string_487_blaaa2', 'B': "day", 'C': 6}]) </code></pre> I want to make a groupby from this part of the string string_300_bla1 I tried: <pre class="prettyprint"><code>import re dfs = df['A'].str.contains('.*_\d+_.*', re.IGNORECASE).groupby(df['B']) </code></pre> My output: <pre class="prettyprint"><code><pandas.core.groupby.generic.SeriesGroupBy object at 0x00000279EFD009E8> </code></pre> Good output: <pre class="prettyprint"><code>dfs = pandas.DataFrame([{'A': 'string_300', 'B': "Hi\n\nHello"}, {'A': 'string_487', 'B': "nice\n\nday"}]) </code></pre>

We can do: <pre class="prettyprint"><code>(df.groupby(df.A.str.extract('(\w+_\d+)')[0]) .agg({'B':'\n\n'.join, 'C':'sum'}) .reset_index() ) </code></pre> Output: <pre class="prettyprint"><code> 0 B C 0 string_300 Hi\n\nHello 7 1 string_487 nice\n\nday 15 </code></pre> <hr> As pointed out by @CharlesGleason, here's the solution that extract the digit parts: <pre class="prettyprint"><code>(df.groupby(df.A.str.extract('\w+_(\d+)')[0]) .agg({'A':'first', 'B':'\n\n'.join, 'C':'sum'}) .reset_index(drop=True) ) </code></pre>

Groupby a part of the string in pandas

import pandas

df = pandas.DataFrame([{'A': 'string_300_bla1', 'B': "Hi", 'C': 3},
                       {'A': 'string_300_blaa2', 'B': "Hello", 'C': 4},
                       {'A': 'string_487_blaaa1', 'B': "nice", 'C': 9},
                       {'A': 'string_487_blaaa2', 'B': "day", 'C': 6}])

I want to make a groupby from this part of the string

string_300_bla1

I tried:

import re

dfs = df['A'].str.contains('.*_\d+_.*', re.IGNORECASE).groupby(df['B'])

My output:

<pandas.core.groupby.generic.SeriesGroupBy object at 0x00000279EFD009E8>

Good output:

dfs = pandas.DataFrame([{'A': 'string_300', 'B': "Hi\n\nHello"},
                       {'A': 'string_487', 'B': "nice\n\nday"}])

236

asked May 19 '20 13:05

ladybug

2 Answers

We can do:

(df.groupby(df.A.str.extract('(\w+_\d+)')[0])
   .agg({'B':'\n\n'.join, 'C':'sum'})
   .reset_index()
)

Output:

            0            B   C
0  string_300  Hi\n\nHello   7
1  string_487  nice\n\nday  15

As pointed out by @CharlesGleason, here's the solution that extract the digit parts:

(df.groupby(df.A.str.extract('\w+_(\d+)')[0])
   .agg({'A':'first', 'B':'\n\n'.join, 'C':'sum'})
   .reset_index(drop=True)
)

156

answered Oct 22 '22 10:10

Quang Hoang

You can use str.rsplit

df.B.groupby(df.A.str.rsplit('_',n=1).str[0]).agg('\n\n'.join).reset_index()
Out[236]: 
            A         B
0  string_300  Hi\n\nHello
1  string_487  nice\n\nday

answered Oct 22 '22 12:10

BENY

Related questions
                            
                                Is there a way to persist decorators during inheritance?
                            
                                TypeError: module() takes at most 2 arguments (3 given) code taken from pluralsight course [duplicate]
                            
                                Error: too many values to unpack (expected 2) when raise error with serializers.ValidationError
                            
                                How to print docstring for class attribute/element?
                            
                                Accessing Microsoft Sharepoint files and data using Python
                            
                                Python how to combine two columns of a dataframe into a single list?
                            
                                AWS Lambda, Python, Numpy and others as Layers
                            
                                Gradcam with guided backprop for transfer learning in Tensorflow 2.0
                            
                                ValueError: 2 columns passed, passed data had 1 columns
                            
                                Get indices of elements in tensor a that are present in tensor b
                            
                                Invoking Google Cloud Function from python using service account for authentication
                            
                                Is it possible to test a while True loop with pytest (I try with a timeout)?
                            
                                Simple Python question: Why can't I assign a variable to a sorted list (in place)? [duplicate]
                            
                                Airflow: Unable to access the AWS providers
                            
                                Difference between Numpy and Tensorflow? [closed]
                            
                                How to handle exception and exit?
                            
                                How to sort a tensor by first dimension in pytorch?
                            
                                Machine learning regression model predicts same value for every image
                            
                                Convert cx_Oracle.LOB data to string in python
                            
                                Why does time.sleep(...) not get affected by the GIL?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Groupby a part of the string in pandas

Tags:

python

pandas

group-by

ladybug

People also ask

2 Answers

Quang Hoang

BENY

Recent Activity

Donate For Us