I have a DF which looks like this. <pre class="prettyprint"><code>name id apps john 1 [[app1, v1], [app2, v2], [app3,v3]] smith 2 [[app1, v1], [app4, v4]] </code></pre> I want to expand the apps column such that it looks like this. <pre class="prettyprint"><code>name id app_name app_version john 1 app1 v1 john 1 app2 v2 john 1 app3 v3 smith 2 app1 v1 smith 2 app4 v4 </code></pre> Any help is appreciated

Chain of <code>pd.Series</code> easy to understand, also if you would like know more methods ,check unnesting <pre class="prettyprint"><code>df.set_index(['name','id']).apps.apply(pd.Series).\ stack().apply(pd.Series).\ reset_index(level=[0,1]).\ rename(columns={0:'app_name',1:'app_version'}) Out[541]: name id app_name app_version 0 john 1 app1 v1 1 john 1 app2 v2 2 john 1 app3 v3 0 smith 2 app1 v1 1 smith 2 app4 v4 </code></pre> <hr> Method two slightly modify the function I write <pre class="prettyprint"><code>def unnesting(df, explode): idx = df.index.repeat(df[explode[0]].str.len()) df1 = pd.concat([ pd.DataFrame({x: sum(df[x].tolist(),[])}) for x in explode], axis=1) df1.index = idx return df1.join(df.drop(explode, 1), how='left') </code></pre> <hr> Then <pre class="prettyprint"><code>yourdf=unnesting(df,['apps']) yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1] yourdf Out[548]: apps id name app_name app_version 0 [app1, v1] 1 john app1 v1 0 [app2, v2] 1 john app2 v2 0 [app3, v3] 1 john app3 v3 1 [app1, v1] 2 smith app1 v1 1 [app4, v4] 2 smith app4 v4 </code></pre> Or <pre class="prettyprint"><code>yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version']) yourdf[['app_name','app_version']]=yourdf.apps.tolist() yourdf Out[567]: apps id name app_name app_version 0 [app1, v1] 1 john app1 v1 0 [app2, v2] 1 john app2 v2 0 [app3, v3] 1 john app3 v3 1 [app1, v1] 2 smith app1 v1 1 [app4, v4] 2 smith app4 v4 </code></pre>

Python Pandas Expand a Column of List of Lists to Two New Column

Tags:

python

list

pandas

I have a DF which looks like this.

name    id  apps
john    1   [[app1, v1], [app2, v2], [app3,v3]]
smith   2   [[app1, v1], [app4, v4]]

I want to expand the apps column such that it looks like this.

name    id  app_name    app_version
john    1   app1        v1
john    1   app2        v2
john    1   app3        v3
smith   2   app1        v1
smith   2   app4        v4

Any help is appreciated

262

asked May 11 '19 23:05

Imsa

2 Answers

You can .apply(pd.Series) twice to get what you need as an intermediate step, then merge back to the original dataframe.

import pandas as pd

df = pd.DataFrame({
    'name': ['john', 'smith'],
    'id': [1, 2],
    'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']], 
             [['app1', 'v1'], ['app4', 'v4']]]
})

dftmp = df.apps.apply(pd.Series).T.melt().dropna()
dfapp = (dftmp.value
              .apply(pd.Series)
              .set_index(dftmp.variable)
              .rename(columns={0:'app_name', 1:'app_version'})
        )

df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
# returns:
    name  id app_name app_version
0   john   1     app1          v1
0   john   1     app2          v2
0   john   1     app3          v3
1  smith   2     app1          v1
1  smith   2     app4          v4

149

answered Nov 13 '22 06:11

James

Chain of pd.Series easy to understand, also if you would like know more methods ,check unnesting

df.set_index(['name','id']).apps.apply(pd.Series).\
         stack().apply(pd.Series).\
            reset_index(level=[0,1]).\
                rename(columns={0:'app_name',1:'app_version'})
Out[541]: 
    name  id app_name app_version
0   john   1     app1          v1
1   john   1     app2          v2
2   john   1     app3          v3
0  smith   2     app1          v1
1  smith   2     app4          v4

Method two slightly modify the function I write

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: sum(df[x].tolist(),[])}) for x in explode], axis=1)
    df1.index = idx
    return df1.join(df.drop(explode, 1), how='left')

Then

yourdf=unnesting(df,['apps'])

yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
yourdf
Out[548]: 
         apps  id   name app_name app_version
0  [app1, v1]   1   john     app1          v1
0  [app2, v2]   1   john     app2          v2
0  [app3, v3]   1   john     app3          v3
1  [app1, v1]   2  smith     app1          v1
1  [app4, v4]   2  smith     app4          v4

yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
yourdf[['app_name','app_version']]=yourdf.apps.tolist()
yourdf
Out[567]: 
         apps  id   name app_name app_version
0  [app1, v1]   1   john     app1          v1
0  [app2, v2]   1   john     app2          v2
0  [app3, v3]   1   john     app3          v3
1  [app1, v1]   2  smith     app1          v1
1  [app4, v4]   2  smith     app4          v4

answered Nov 13 '22 06:11

BENY

Related questions
                            
                                Inserting NULL as default in SQLAlchemy?
                            
                                K.gradients(loss, input_img)[0] return "None". (Keras CNN visualization with tensorflow backend)
                            
                                Does using scrapy-splash significantly affect scraping speed? [closed]
                            
                                pandas read sql db2 corrupts decimal
                            
                                Remove Minutes and Hours from Series
                            
                                How to create mask images from COCO dataset?
                            
                                Tensorflow InvalidArgumentError (indices) while training with Keras
                            
                                Plotting two histograms from a pandas DataFrame in one subplot using matplotlib
                            
                                Plot importance variables xgboost Python
                            
                                pandas groupby aggregate element-wise list addition
                            
                                how to connect to region in boto3
                            
                                Change color of missing values in Seaborn heatmap
                            
                                Return two data frames from a function with data frame format
                            
                                How to write CUSTOM metadata into JPEG with Python?
                            
                                Gunicorn won't start Flask app because "Application object must be callable"
                            
                                Downloading dynamically generated files from a Dash/Flask app
                            
                                Provide a path to gdal-config using a GDAL_CONFIG environment variable error while attempting to install Fiona
                            
                                Pandas group the rows in a dataframe based on specific column value
                            
                                How to sort a set in python? [duplicate]
                            
                                What is the difference between pywin32 and pypiwin32?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With