I accidentally closed this question with a link to the wrong duplicate. Here is the correct one: Pandas split column of lists into multiple columns.
Suppose I have a dataframe of which one column is a list (of a known and identical length) or tuple, for example:
df1 = pd.DataFrame(
{'vals': [['a', 'b', 'c', 'd'],['e','f','g','h']]}
)
ie:
vals
0 [a, b, c, d]
1 [e, f, g, h]
I want to extra the values in "vals" into separate named columns. I can do this clumsily by iterating through the rows:
for i in range(df1.shape[0]):
for j in range(0,4):
df1.loc[i, 'vals_'+j] = df1.loc[i, 'vals'] [j]
Result as desired:
vals vals_0 vals_1 vals_2 vals_3
0 [a, b, c, d] a b c d
1 [e, f, g, h] e f g h
Is there a neater (vectorised) way? I tried using [] but I get an error.
for j in range (0,4)
df1['vals_'+str(j)] = df1['vals'][j]
gives:
ValueError: Length of values does not match length of index
It looks like Pandas is trying to apply the [] operator to the series/dataframe rather than the column content.
You can use assign
, apply
, with pd.Series
:
df1.assign(**df1.vals.apply(pd.Series).add_prefix('val_'))
A faster method for more data is to use .values and tolist() with dataframe constructor:
df1.assign(**pd.DataFrame(df1.vals.values.tolist()).add_prefix('val_'))
Output:
vals val_0 val_1 val_2 val_3
0 [a, b, c, d] a b c d
1 [e, f, g, h] e f g h
You can apply the Series
initializer to vals
, and then add_prefix
to get the column names you're looking for. Then concat
to the original for the desired output:
pd.concat([df1.vals, df1.vals.apply(pd.Series).add_prefix("vals_")], axis=1)
vals vals_0 vals_1 vals_2 vals_3
0 [a, b, c, d] a b c d
1 [e, f, g, h] e f g h
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With