Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Explode column of list to multiple rows

I want to expand the list in a certain column (in the example column_x) to multiple rows.

So

df = pd.DataFrame({'column_a': ['a_1', 'a_2'], 
                   'column_b': ['b_1', 'b_2'], 
                   'column_x': [['c_1', 'c_2'], ['d_1', 'd_2']]
                  })

shall be transformed from

    column_a    column_b    column_x
0   a_1         b_1         [c_1, c_2]
1   a_2         b_2         [d_1, d_2]

to

    column_a    column_b    column_x
0   a_1         b_1         c_1
1   a_1         b_1         c_2
2   a_2         b_2         d_1
3   a_2         b_2         d_2

The code I have so far does exactly this, and it does it fast.

lens = [len(item) for item in df['column_x']]
pd.DataFrame( {"column_a" : np.repeat(df['column_a'].values, lens), 
               "column_b" : np.repeat(df['column_b'].values, lens), 
               "column_x" : np.concatenate(df['column_x'].values)})

However, I have lots of columns. Is there a neat and elegant solution for repeating the whole data frame without specifying each column again?

like image 411
Michael Dorner Avatar asked Mar 07 '18 09:03

Michael Dorner


People also ask

How do I split a column into multiple rows in Python?

To split multiple array column data into rows pyspark provides a function called explode(). Using explode, we will get a new row for each element in the array.

How do you explode a list in a DataFrame?

DataFrame - explode() functionThe explode() function is used to transform each element of a list-like to a row, replicating the index values. Exploded lists to rows of the subset columns; index will be duplicated for these rows. Raises: ValueError - if columns of the frame are not unique.

How do I split a row into multiple rows in pandas DataFrame?

How do you split a row in a data frame? Using the iloc() function to split DataFrame in Python We can use the iloc() function to slice DataFrames into smaller DataFrames. The iloc() function allows us to access elements based on the index of rows and columns.


1 Answers

Pandas >= 0.25

Pandas can do this in a single function call via df.explode.

df.explode('column_x')

  column_a column_b column_x
0      a_1      b_1      c_1
0      a_1      b_1      c_2
1      a_2      b_2      d_1
1      a_2      b_2      d_2

Note that you can only explode a Series/DataFrame on one column.


Pandas < 0.25

Call np.repeat along the 0th axis for every column besides column_x.

df1 = pd.DataFrame(
    df.drop('column_x', 1).values.repeat(df['column_x'].str.len(), axis=0),
    columns=df.columns.difference(['column_x'])
)
df1['column_x'] = np.concatenate(df['column_x'].values)

df1

  column_a column_b column_x
0      a_1      b_1      c_1
1      a_1      b_1      c_2
2      a_2      b_2      d_1
3      a_2      b_2      d_2
like image 125
cs95 Avatar answered Oct 18 '22 23:10

cs95