I have a DF which looks like this.
name id apps
john 1 [[app1, v1], [app2, v2], [app3,v3]]
smith 2 [[app1, v1], [app4, v4]]
I want to expand the apps column such that it looks like this.
name id app_name app_version
john 1 app1 v1
john 1 app2 v2
john 1 app3 v3
smith 2 app1 v1
smith 2 app4 v4
Any help is appreciated
To split a pandas column of lists into multiple columns, create a new dataframe by applying the tolist() function to the column. The following is the syntax. You can also pass the names of new columns resulting from the split as a list. Let's see it action with the help of an example.
split() function is used to break up single column values into multiple columns based on a specified separator or delimiter. The Series. str. split() function is similar to the Python string split() method, but split() method works on the all Dataframe columns, whereas the Series.
We can use str. split() to split one column to multiple columns by specifying expand=True option. We can use str. extract() to exract multiple columns using regex expression in which multiple capturing groups are defined.
We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.
You can .apply(pd.Series)
twice to get what you need as an intermediate step, then merge back to the original dataframe.
import pandas as pd
df = pd.DataFrame({
'name': ['john', 'smith'],
'id': [1, 2],
'apps': [[['app1', 'v1'], ['app2', 'v2'], ['app3','v3']],
[['app1', 'v1'], ['app4', 'v4']]]
})
dftmp = df.apps.apply(pd.Series).T.melt().dropna()
dfapp = (dftmp.value
.apply(pd.Series)
.set_index(dftmp.variable)
.rename(columns={0:'app_name', 1:'app_version'})
)
df[['name', 'id']].merge(dfapp, left_index=True, right_index=True)
# returns:
name id app_name app_version
0 john 1 app1 v1
0 john 1 app2 v2
0 john 1 app3 v3
1 smith 2 app1 v1
1 smith 2 app4 v4
Chain of pd.Series
easy to understand, also if you would like know more methods ,check unnesting
df.set_index(['name','id']).apps.apply(pd.Series).\
stack().apply(pd.Series).\
reset_index(level=[0,1]).\
rename(columns={0:'app_name',1:'app_version'})
Out[541]:
name id app_name app_version
0 john 1 app1 v1
1 john 1 app2 v2
2 john 1 app3 v3
0 smith 2 app1 v1
1 smith 2 app4 v4
Method two slightly modify the function I write
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: sum(df[x].tolist(),[])}) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
Then
yourdf=unnesting(df,['apps'])
yourdf['app_name'],yourdf['app_version']=yourdf.apps.str[0],yourdf.apps.str[1]
yourdf
Out[548]:
apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4
Or
yourdf=unnesting(df,['apps']).reindex(columns=df.columns.tolist()+['app_name','app_version'])
yourdf[['app_name','app_version']]=yourdf.apps.tolist()
yourdf
Out[567]:
apps id name app_name app_version
0 [app1, v1] 1 john app1 v1
0 [app2, v2] 1 john app2 v2
0 [app3, v3] 1 john app3 v3
1 [app1, v1] 2 smith app1 v1
1 [app4, v4] 2 smith app4 v4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With