Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Unnest cells in Pandas DataFrame

Suppose I have DataFrame df:

a b c
v f 3|4|5
v 2 6
v f 4|5

I'd like to produce this df:

a b c
v f 3
v f 4
v f 5
v 2 6
v f 4
v f 5

I know how to make this transformation in R, using tidyr package.

Is there an easy way of doing this in pandas?

like image 548
Guilherme Jardim Duarte Avatar asked Feb 03 '16 00:02

Guilherme Jardim Duarte


People also ask

How do you Unnest in Pandas?

Unnesting is nothing but exploding the lists into rows. So this transformation can be done easily with the help of the pandas series. explode() method. This method is used to transform list-like elements of a series object into rows, and the index will be duplicated for these rows.

Can we perform Crossjoin in DataFrame?

In Pandas, there are parameters to perform left, right, inner or outer merge and join on two DataFrames or Series. However there's no possibility as of now to perform a cross join to merge or join two methods using how="cross" parameter.

How do you explode all columns in a DataFrame?

Column(s) to explode. For multiple columns, specify a non-empty list with each element be str or tuple, and all specified columns their list-like data on same row of the frame must have matching length. If True, the resulting index will be labeled 0, 1, …, n - 1.

How do I drop a specific value in Pandas?

We can use the column_name function along with the operator to drop the specific value.


2 Answers

You could:

import numpy as np

df = df.set_index(['a', 'b'])
df = df.astype(str) + '| ' # There's a space ' ' to match the replace later
df = df.c.str.split('|', expand=True).stack().reset_index(-1, drop=True).replace(' ', np.nan).dropna().reset_index() # and replace also has a space ' '

to get:

   a  b  0
0  v  f  3
1  v  f  4
2  v  f  5
3  v  2  6
4  v  f  4
5  v  f  5
like image 190
Stefan Avatar answered Nov 06 '22 12:11

Stefan


Option 1

In [3404]: (df.set_index(['a', 'b'])['c']
              .str.split('|', expand=True).stack()
              .reset_index(name='c').drop('level_2', 1))
Out[3404]:
   a  b  c
0  v  f  3
1  v  f  4
2  v  f  5
3  v  2  6
4  v  f  4
5  v  f  5

Option 2 Using repeat and loc

In [3503]: s = df.c.str.split('|')

In [3504]: df.loc[df.index.repeat(s.str.len())].assign(c=np.concatenate(s))
Out[3504]:
   a  b  c
0  v  f  3
0  v  f  4
0  v  f  5
1  v  2  6
2  v  f  4
2  v  f  5

Details

In [3505]: s
Out[3505]:
0    [3, 4, 5]
1          [6]
2       [4, 5]
Name: c, dtype: object
like image 1
Zero Avatar answered Nov 06 '22 11:11

Zero