I'm trying to collapse rows in a dataframe that contains a column of ID data and a number of columns that each hold a different string. It looks like groupby is the solution, but it seems to be slanted towards performing some numeric function on the group - I just want to keep the text. Here's what I've got... I have a dataframe of the form: <pre class="prettyprint"><code>index ID apples pears oranges 0 101 oranges 1 134 apples 2 576 pears 3 837 apples 4 576 oranges 5 134 pears </code></pre> The columns are clean: so the apples column will only ever have the text "apples" in it, or it will be blank". Where there are multiple entries under the same ID (in this example, on IDs 134 & 576), I want to collapse the rows together to get this: <pre class="prettyprint"><code>index ID apples pears oranges 0 101 oranges 1 134 apples pears 2 576 pears oranges 3 837 apples </code></pre> I could do this by iterating over the rows, but it seems like a non-pandas solution. Is there a better way?

You can use <code>groupby</code> with aggregation <code>''.join</code>, <code>sum</code> or <code>max</code>: <pre class="prettyprint"><code>#if blank values are NaN first replace to '' df = df.fillna('') df = df.groupby('ID').agg(''.join) print (df) apples pears oranges ID 101 oranges 134 apples pears 576 pears oranges 837 apples </code></pre> Also works: <pre class="prettyprint"><code>df = df.fillna('') df = df.groupby('ID').sum() #alternatively max #df = df.groupby('ID').max() print (df) apples pears oranges ID 101 oranges 134 apples pears 576 pears oranges 837 apples </code></pre> Also if need remove duplicates per group and per column add <code>unique</code>: <pre class="prettyprint"><code>df = df.groupby('ID').agg(lambda x: ''.join(x.unique())) </code></pre>

Collapsing rows in a Pandas dataframe

Tags:

python-3.x

pandas

I'm trying to collapse rows in a dataframe that contains a column of ID data and a number of columns that each hold a different string. It looks like groupby is the solution, but it seems to be slanted towards performing some numeric function on the group - I just want to keep the text. Here's what I've got...

I have a dataframe of the form:

index    ID     apples    pears    oranges
0        101                       oranges
1        134    apples
2        576              pears
3        837    apples
4        576                       oranges
5        134              pears

The columns are clean: so the apples column will only ever have the text "apples" in it, or it will be blank".

Where there are multiple entries under the same ID (in this example, on IDs 134 & 576), I want to collapse the rows together to get this:

index    ID     apples    pears    oranges
0        101                       oranges
1        134    apples    pears
2        576              pears    oranges
3        837    apples

I could do this by iterating over the rows, but it seems like a non-pandas solution. Is there a better way?

486

asked Apr 20 '17 08:04

user4896331

1 Answers

You can use groupby with aggregation ''.join, sum or max:

#if blank values are NaN first replace to ''
df = df.fillna('')

df = df.groupby('ID').agg(''.join)
print (df)
     apples  pears  oranges
ID                         
101                 oranges
134  apples  pears         
576          pears  oranges
837  apples

Also works:

df = df.fillna('')
df = df.groupby('ID').sum()
#alternatively max
#df = df.groupby('ID').max()
print (df)
     apples  pears  oranges
ID                         
101                 oranges
134  apples  pears         
576          pears  oranges
837  apples

Also if need remove duplicates per group and per column add unique:

df = df.groupby('ID').agg(lambda x: ''.join(x.unique()))

answered Sep 20 '22 11:09

jezrael

Related questions
                            
                                Remove 'command not found' error discord.py
                            
                                Get ZeroDivisionError: float division in python
                            
                                Rounding error in Python with non-odd number? [duplicate]
                            
                                How does iter() work, it's giving "TypeError: iter(v, w): v must be callable"
                            
                                working with .bmp files in python 3
                            
                                What does Exception: Randomness of hash of string should be disabled via PYTHONHASHSEED mean in pyspark?
                            
                                exec() not working inside function python3.x
                            
                                Dropbox API v2 - trying to upload file with files_upload() - throws TypeError
                            
                                Python 3: gzip.open() and modes
                            
                                ImportError: cannot import name 'PandasError'
                            
                                Implement Parallel for loops in Python
                            
                                Trouble getting the trade-price using "Requests-HTML" library
                            
                                converting 13-digit unixtime in ms to timestamp in python
                            
                                Unpack dictionary from Pandas Column
                            
                                An error for generating an exe file using pyinstaller - typeerror: expected str, bytes or os.PathLike object, not NoneType
                            
                                SyntaxError on "self.async" when running python kafka producer
                            
                                Difference between "grid" and "pack" geometry managers
                            
                                Convert strings to int or float in Python 3?
                            
                                Python 3 replacement for deprecated compiler.ast flatten function
                            
                                Add inline model to django admin site

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With