I want to merge rows of dataframe with one common column value and then merge rest of the column values separated by comma for string values and convert to array/list for int values. <pre class="prettyprint lang-py prettyprint-override"><code>A B C D 1 one 100 value 4 four 400 value 5 five 500 value 2 two 200 value </code></pre> Expecting result like: <pre class="prettyprint lang-py prettyprint-override"><code> A B C D [1,4,5,2] one,four,five,two [100,400,500,200] value </code></pre> I can use groupby for column D but how can I use apply for columns A,C as apply(np.array) and apply(','.join) for column B in df all at once?

<pre class="prettyprint"><code>df = df.groupby('D').apply(lambda x: pd.Series([list(x.A),','.join(x.B),list(x.C)])).reset_index().rename({0:'A',1:'B',2:'C'}, axis=1) df = df[['A','B','C','D']] </code></pre> Output <pre class="prettyprint"><code> A B C D 0 [1, 4, 5, 2] one,four,five,two [100, 400, 500, 200] value </code></pre>

How to merge rows in dataframe with different columns?

Tags:

python

pandas

dataframe

pandas-groupby

I want to merge rows of dataframe with one common column value and then merge rest of the column values separated by comma for string values and convert to array/list for int values.

A   B     C    D
1  one   100  value
4  four  400  value
5  five  500  value
2  two   200  value

Expecting result like:

   A                B                 C            D
[1,4,5,2]  one,four,five,two  [100,400,500,200]  value

I can use groupby for column D but how can I use apply for columns A,C as apply(np.array) and apply(','.join) for column B in df all at once?

285

asked Jun 25 '19 05:06

k92

2 Answers

Dynamic solution - strings columns are joined and numeric are converted to lists with GroupBy.agg:

f = lambda x: x.tolist() if np.issubdtype(x.dtype, np.number) else ','.join(x)
#similar for test strings - https://stackoverflow.com/a/37727662
#f = lambda x: ','.join(x) if np.issubdtype(x.dtype, np.flexible) else x.tolist()
df1 = df.groupby('D').agg(f).reset_index().reindex(columns=df.columns)
print (df1)
              A                  B                     C      D
0  [1, 4, 5, 2]  one,four,five,two  [100, 400, 500, 200]  value

Another solution is specify each functions separately for each column:

df2 = (df.groupby('D')
        .agg({'A': lambda x: x.tolist(), 'B': ','.join, 'C':lambda x: x.tolist()})
        .reset_index()
        .reindex(columns=df.columns))

print (df2)

              A                  B                     C      D
0  [1, 4, 5, 2]  one,four,five,two  [100, 400, 500, 200]  value

152

answered Nov 01 '22 12:11

jezrael

df = df.groupby('D').apply(lambda x: pd.Series([list(x.A),','.join(x.B),list(x.C)])).reset_index().rename({0:'A',1:'B',2:'C'}, axis=1)

df = df[['A','B','C','D']]

Output

              A                  B                     C      D
0  [1, 4, 5, 2]  one,four,five,two  [100, 400, 500, 200]  value

answered Nov 01 '22 12:11

iamklaus

Related questions
                            
                                PIL: Image.fromarray(img.astype('uint8'), mode='RGB') returns grayscale image
                            
                                List subscriptions for a given Azure account
                            
                                Assert called with argument of specific type
                            
                                Dropping duplicate records ignoring case
                            
                                Why do we pass nn.Module as an argument to class definition for neural nets?
                            
                                Interleaving NumPy arrays with mismatching shapes
                            
                                How do you change environment of Python Interactive on Vscode?
                            
                                model.fit vs model.predict - differences & usage in sklearn
                            
                                Can't continue a program when binance api(connection) error occured
                            
                                FileNotFoundError: [Errno 2] when packaging for PyPI
                            
                                How to determine input shape in keras?
                            
                                Drawing this pattern using Python’s turtle module. Some squares on top of each other but tilted sort of like a spiral
                            
                                How to install older version of pytorch
                            
                                Equality comparison does not work inside TensorFlow 2.0 tf.function()
                            
                                Compare two dataframe columns for matching percentage
                            
                                How to fill color by groups in histogram using Matplotlib?
                            
                                PySide2 equivalent of PyQt5's loadUiType() to dynamically mix in UI designs
                            
                                shared condaenv for multiple users on Windows
                            
                                How can I detect common elements lists and groupe lists with at least 1 common element?
                            
                                Multiprocessing: pool and map and sys.exit()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With