Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Groupby a part of the string in pandas

I'm trying to build a new dataframe from grouping part of the string into a column.

import pandas

df = pandas.DataFrame([{'A': 'string_300_bla1', 'B': "Hi", 'C': 3},
                       {'A': 'string_300_blaa2', 'B': "Hello", 'C': 4},
                       {'A': 'string_487_blaaa1', 'B': "nice", 'C': 9},
                       {'A': 'string_487_blaaa2', 'B': "day", 'C': 6}])

I want to make a groupby from this part of the string

string_300_bla1

I tried:

import re

dfs = df['A'].str.contains('.*_\d+_.*', re.IGNORECASE).groupby(df['B'])

My output:

<pandas.core.groupby.generic.SeriesGroupBy object at 0x00000279EFD009E8>

Good output:

dfs = pandas.DataFrame([{'A': 'string_300', 'B': "Hi\n\nHello"},
                       {'A': 'string_487', 'B': "nice\n\nday"}])
like image 236
ladybug Avatar asked May 19 '20 13:05

ladybug


People also ask

How do you split a Groupby in pandas?

Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.

How do I group values in a column in pandas?

Groupby is a very powerful pandas method. You can group by one column and count the values of another column per this column value using value_counts. Using groupby and value_counts we can count the number of activities each person did.

How to group data in pandas?

The abstract definition of grouping is to provide a mapping of labels to group names. Pandas datasets can be split into any of their objects. There are multiple ways to split data like: Note : In this we refer to the grouping objects as the keys. In order to group data with one key, we pass only one key as an argument in groupby function.

What is groupby in pandas Dataframe?

Pandas groupby is used for grouping the data according to the categories and apply a function to the categories. It also helps to aggregate data efficiently. Pandas dataframe.groupby () function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes.

What is the difference between groupby () and apply () pattern in pandas?

To get some background information, check out How to Speed Up Your Pandas Projects. What may happen with .apply () is that it will effectively perform a Python loop over each group. While the .groupby (...).apply () pattern can provide some flexibility, it can also inhibit Pandas from otherwise using its Cython-based optimizations.

How to inspect a pandas groupby object and see the splitting?

One useful way to inspect a Pandas GroupBy object and see the splitting in action is to iterate over it. This is implemented in DataFrameGroupBy.__iter__ () and produces an iterator of (group, DataFrame) pairs for DataFrames:


2 Answers

We can do:

(df.groupby(df.A.str.extract('(\w+_\d+)')[0])
   .agg({'B':'\n\n'.join, 'C':'sum'})
   .reset_index()
)

Output:

            0            B   C
0  string_300  Hi\n\nHello   7
1  string_487  nice\n\nday  15

As pointed out by @CharlesGleason, here's the solution that extract the digit parts:

(df.groupby(df.A.str.extract('\w+_(\d+)')[0])
   .agg({'A':'first', 'B':'\n\n'.join, 'C':'sum'})
   .reset_index(drop=True)
)
like image 156
Quang Hoang Avatar answered Oct 22 '22 10:10

Quang Hoang


You can use str.rsplit

df.B.groupby(df.A.str.rsplit('_',n=1).str[0]).agg('\n\n'.join).reset_index()
Out[236]: 
            A         B
0  string_300  Hi\n\nHello
1  string_487  nice\n\nday
like image 4
BENY Avatar answered Oct 22 '22 12:10

BENY