Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"unstack" a pandas column containing lists into multiple rows [duplicate]

Say I have the following Pandas Dataframe:

df = pd.DataFrame({"a" : [1,2,3], "b" : [[1,2],[2,3,4],[5]]})    a          b 0  1     [1, 2] 1  2  [2, 3, 4] 2  3        [5] 

How would I "unstack" the lists in the "b" column in order to transform it into the dataframe:

   a  b 0  1  1 1  1  2 2  2  2 3  2  3 4  2  4 5  3  5 
like image 360
Alex Avatar asked Feb 02 '17 20:02

Alex


People also ask

What does unstack () do in pandas?

Pandas Unstack is a function that pivots the level of the indexed columns in a stacked dataframe.

How do you Unnest in pandas?

Pandas with Python Unnesting is nothing but exploding the lists into rows. So this transformation can be done easily with the help of the pandas series. explode() method. This method is used to transform list-like elements of a series object into rows, and the index will be duplicated for these rows.

How do you explode a column in pandas?

Pandas DataFrame: explode() functionThe explode() function is used to transform each element of a list-like to a row, replicating the index values. Exploded lists to rows of the subset columns; index will be duplicated for these rows. Raises: ValueError - if columns of the frame are not unique.


1 Answers

UPDATE: generic vectorized approach - will work also for multiple columns DFs:

assuming we have the following DF:

In [159]: df Out[159]:    a          b  c 0  1     [1, 2]  5 1  2  [2, 3, 4]  6 2  3        [5]  7 

Solution:

In [160]: lst_col = 'b'  In [161]: pd.DataFrame({      ...:     col:np.repeat(df[col].values, df[lst_col].str.len())      ...:     for col in df.columns.difference([lst_col])      ...: }).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns.tolist()]      ...: Out[161]:    a  b  c 0  1  1  5 1  1  2  5 2  2  2  6 3  2  3  6 4  2  4  6 5  3  5  7 

Setup:

df = pd.DataFrame({     "a" : [1,2,3],     "b" : [[1,2],[2,3,4],[5]],     "c" : [5,6,7] }) 

Vectorized NumPy approach:

In [124]: pd.DataFrame({'a':np.repeat(df.a.values, df.b.str.len()),                         'b':np.concatenate(df.b.values)}) Out[124]:    a  b 0  1  1 1  1  2 2  2  2 3  2  3 4  2  4 5  3  5 

OLD answer:

Try this:

In [89]: df.set_index('a', append=True).b.apply(pd.Series).stack().reset_index(level=[0, 2], drop=True).reset_index() Out[89]:    a    0 0  1  1.0 1  1  2.0 2  2  2.0 3  2  3.0 4  2  4.0 5  3  5.0 

Or bit nicer solution provided by @Boud:

In [110]: df.set_index('a').b.apply(pd.Series).stack().reset_index(level=-1, drop=True).astype(int).reset_index() Out[110]:    a  0 0  1  1 1  1  2 2  2  2 3  2  3 4  2  4 5  3  5 
like image 199
MaxU - stop WAR against UA Avatar answered Sep 20 '22 02:09

MaxU - stop WAR against UA