Say I have the following Pandas Dataframe:
df = pd.DataFrame({"a" : [1,2,3], "b" : [[1,2],[2,3,4],[5]]}) a b 0 1 [1, 2] 1 2 [2, 3, 4] 2 3 [5]
How would I "unstack" the lists in the "b" column in order to transform it into the dataframe:
a b 0 1 1 1 1 2 2 2 2 3 2 3 4 2 4 5 3 5
Pandas Unstack is a function that pivots the level of the indexed columns in a stacked dataframe.
Pandas with Python Unnesting is nothing but exploding the lists into rows. So this transformation can be done easily with the help of the pandas series. explode() method. This method is used to transform list-like elements of a series object into rows, and the index will be duplicated for these rows.
Pandas DataFrame: explode() functionThe explode() function is used to transform each element of a list-like to a row, replicating the index values. Exploded lists to rows of the subset columns; index will be duplicated for these rows. Raises: ValueError - if columns of the frame are not unique.
UPDATE: generic vectorized approach - will work also for multiple columns DFs:
assuming we have the following DF:
In [159]: df Out[159]: a b c 0 1 [1, 2] 5 1 2 [2, 3, 4] 6 2 3 [5] 7
Solution:
In [160]: lst_col = 'b' In [161]: pd.DataFrame({ ...: col:np.repeat(df[col].values, df[lst_col].str.len()) ...: for col in df.columns.difference([lst_col]) ...: }).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns.tolist()] ...: Out[161]: a b c 0 1 1 5 1 1 2 5 2 2 2 6 3 2 3 6 4 2 4 6 5 3 5 7
Setup:
df = pd.DataFrame({ "a" : [1,2,3], "b" : [[1,2],[2,3,4],[5]], "c" : [5,6,7] })
Vectorized NumPy approach:
In [124]: pd.DataFrame({'a':np.repeat(df.a.values, df.b.str.len()), 'b':np.concatenate(df.b.values)}) Out[124]: a b 0 1 1 1 1 2 2 2 2 3 2 3 4 2 4 5 3 5
OLD answer:
Try this:
In [89]: df.set_index('a', append=True).b.apply(pd.Series).stack().reset_index(level=[0, 2], drop=True).reset_index() Out[89]: a 0 0 1 1.0 1 1 2.0 2 2 2.0 3 2 3.0 4 2 4.0 5 3 5.0
Or bit nicer solution provided by @Boud:
In [110]: df.set_index('a').b.apply(pd.Series).stack().reset_index(level=-1, drop=True).astype(int).reset_index() Out[110]: a 0 0 1 1 1 1 2 2 2 2 3 2 3 4 2 4 5 3 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With