I want to expand the list in a certain column (in the example column_x) to multiple rows.
So
df = pd.DataFrame({'column_a': ['a_1', 'a_2'],
'column_b': ['b_1', 'b_2'],
'column_x': [['c_1', 'c_2'], ['d_1', 'd_2']]
})
shall be transformed from
column_a column_b column_x
0 a_1 b_1 [c_1, c_2]
1 a_2 b_2 [d_1, d_2]
to
column_a column_b column_x
0 a_1 b_1 c_1
1 a_1 b_1 c_2
2 a_2 b_2 d_1
3 a_2 b_2 d_2
The code I have so far does exactly this, and it does it fast.
lens = [len(item) for item in df['column_x']]
pd.DataFrame( {"column_a" : np.repeat(df['column_a'].values, lens),
"column_b" : np.repeat(df['column_b'].values, lens),
"column_x" : np.concatenate(df['column_x'].values)})
However, I have lots of columns. Is there a neat and elegant solution for repeating the whole data frame without specifying each column again?
To split multiple array column data into rows pyspark provides a function called explode(). Using explode, we will get a new row for each element in the array.
DataFrame - explode() functionThe explode() function is used to transform each element of a list-like to a row, replicating the index values. Exploded lists to rows of the subset columns; index will be duplicated for these rows. Raises: ValueError - if columns of the frame are not unique.
How do you split a row in a data frame? Using the iloc() function to split DataFrame in Python We can use the iloc() function to slice DataFrames into smaller DataFrames. The iloc() function allows us to access elements based on the index of rows and columns.
Pandas can do this in a single function call via df.explode
.
df.explode('column_x')
column_a column_b column_x
0 a_1 b_1 c_1
0 a_1 b_1 c_2
1 a_2 b_2 d_1
1 a_2 b_2 d_2
Note that you can only explode a Series/DataFrame on one column.
Call np.repeat
along the 0th axis for every column besides column_x
.
df1 = pd.DataFrame(
df.drop('column_x', 1).values.repeat(df['column_x'].str.len(), axis=0),
columns=df.columns.difference(['column_x'])
)
df1['column_x'] = np.concatenate(df['column_x'].values)
df1
column_a column_b column_x
0 a_1 b_1 c_1
1 a_1 b_1 c_2
2 a_2 b_2 d_1
3 a_2 b_2 d_2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With