Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How best to extract a Pandas column containing lists or tuples into multiple columns [duplicate]

Tags:

python

pandas

I accidentally closed this question with a link to the wrong duplicate. Here is the correct one: Pandas split column of lists into multiple columns.

Suppose I have a dataframe of which one column is a list (of a known and identical length) or tuple, for example:

df1 = pd.DataFrame(
 {'vals': [['a', 'b', 'c', 'd'],['e','f','g','h']]}
)

ie:

    vals
0   [a, b, c, d]
1   [e, f, g, h]

I want to extra the values in "vals" into separate named columns. I can do this clumsily by iterating through the rows:

for i in range(df1.shape[0]):
   for j in range(0,4):
      df1.loc[i, 'vals_'+j] = df1.loc[i, 'vals'] [j]

Result as desired:

    vals            vals_0  vals_1  vals_2  vals_3
0   [a, b, c, d]    a       b       c       d 
1   [e, f, g, h]    e       f       g       h

Is there a neater (vectorised) way? I tried using [] but I get an error.

   for j in range (0,4)
       df1['vals_'+str(j)] = df1['vals'][j]

gives:

ValueError: Length of values does not match length of index

It looks like Pandas is trying to apply the [] operator to the series/dataframe rather than the column content.

like image 649
Tom Walker Avatar asked Dec 11 '22 08:12

Tom Walker


2 Answers

You can use assign, apply, with pd.Series:

df1.assign(**df1.vals.apply(pd.Series).add_prefix('val_'))

A faster method for more data is to use .values and tolist() with dataframe constructor:

df1.assign(**pd.DataFrame(df1.vals.values.tolist()).add_prefix('val_'))

Output:

           vals val_0 val_1 val_2 val_3
0  [a, b, c, d]     a     b     c     d
1  [e, f, g, h]     e     f     g     h
like image 60
Scott Boston Avatar answered Dec 12 '22 21:12

Scott Boston


You can apply the Series initializer to vals, and then add_prefix to get the column names you're looking for. Then concat to the original for the desired output:

pd.concat([df1.vals, df1.vals.apply(pd.Series).add_prefix("vals_")], axis=1)

           vals vals_0 vals_1 vals_2 vals_3
0  [a, b, c, d]      a      b      c      d
1  [e, f, g, h]      e      f      g      h
like image 40
andrew_reece Avatar answered Dec 12 '22 22:12

andrew_reece