Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do 'lateral view explode()' in pandas [duplicate]

Tags:

python

pandas

I want to do this :

# input:
        A   B
0  [1, 2]  10
1  [5, 6] -20
# output:
   A   B
0  1  10
1  2  10
2  5 -20
3  6 -20

Every column A's value is a list

df = pd.DataFrame({'A':[[1,2],[5,6]],'B':[10,-20]})
df = pd.DataFrame([[item]+list(df.loc[line,'B':]) for line in df.index for item in df.loc[line,'A']],
                  columns=df.columns)

The above code can work but it's very slow

is there any clever method?

Thank you

like image 784
Zhang Tong Avatar asked Jul 18 '16 04:07

Zhang Tong


People also ask

How do you explode in pandas?

The explode() function is used to transform each element of a list-like to a row, replicating the index values. Exploded lists to rows of the subset columns; index will be duplicated for these rows. Raises: ValueError - if columns of the frame are not unique. Download the Pandas DataFrame Notebooks from here.

How do you explode a list inside a DataFrame cell into separate rows?

For most cases, the correct answer is to now use pandas. DataFrame. explode() as shown in this answer, or pandas. Series.

How do I explode two columns in a DataFrame?

Column(s) to explode. For multiple columns, specify a non-empty list with each element be str or tuple, and all specified columns their list-like data on same row of the frame must have matching length. If True, the resulting index will be labeled 0, 1, …, n - 1. New in version 1.1.

How do you Unnest data frames?

Method 3: Using tidyr package The unnest() method in the package can be used to convert the data frame into an unnested object by specifying the input data and its corresponding columns to use in unnesting. The output is produced in the form of a tibble in R.


1 Answers

Method 1 (OP)

pd.DataFrame([[item]+list(df.loc[line,'B':]) for line in df.index for item in df.loc[line,'A']],
             columns=df.columns)

Method 2 (pir)

df1 = df.A.apply(pd.Series).stack().rename('A')
df2 = df1.to_frame().reset_index(1, drop=True)
df2.join(df.B).reset_index(drop=True)

Method 3 (pir)

A = np.asarray(df.A.values.tolist())
B = np.stack([df.B for _ in xrange(A.shape[1])]).T
P = np.stack([A, B])
pd.Panel(P, items=['A', 'B']).to_frame().reset_index(drop=True)

Thanks @user113531 for the reference to Alexander's answer. I had to modify it to work.

Method 4 (@Alexander) LINKED ANSWER

(Follow link and Up Vote if this was helpful)

rows = []
for i, row in df.iterrows():
    for a in row.A:
        rows.append([a, row.B])

pd.DataFrame(rows, columns=df.columns)

Timings

Method 4 (Alexander's) is the best followed by Method 3

enter image description here

like image 174
piRSquared Avatar answered Sep 29 '22 04:09

piRSquared