How to do 'lateral view explode()' in pandas [duplicate]

Tags:

python

pandas

I want to do this :

# input:
        A   B
0  [1, 2]  10
1  [5, 6] -20
# output:
   A   B
0  1  10
1  2  10
2  5 -20
3  6 -20

Every column A's value is a list

df = pd.DataFrame({'A':[[1,2],[5,6]],'B':[10,-20]})
df = pd.DataFrame([[item]+list(df.loc[line,'B':]) for line in df.index for item in df.loc[line,'A']],
                  columns=df.columns)

The above code can work but it's very slow

is there any clever method?

Thank you

784

asked Jul 18 '16 04:07

Zhang Tong

1 Answers

Method 1 (OP)

pd.DataFrame([[item]+list(df.loc[line,'B':]) for line in df.index for item in df.loc[line,'A']],
             columns=df.columns)

Method 2 (pir)

df1 = df.A.apply(pd.Series).stack().rename('A')
df2 = df1.to_frame().reset_index(1, drop=True)
df2.join(df.B).reset_index(drop=True)

Method 3 (pir)

A = np.asarray(df.A.values.tolist())
B = np.stack([df.B for _ in xrange(A.shape[1])]).T
P = np.stack([A, B])
pd.Panel(P, items=['A', 'B']).to_frame().reset_index(drop=True)

Thanks @user113531 for the reference to Alexander's answer. I had to modify it to work.

Method 4 (@Alexander) LINKED ANSWER

(Follow link and Up Vote if this was helpful)

rows = []
for i, row in df.iterrows():
    for a in row.A:
        rows.append([a, row.B])

pd.DataFrame(rows, columns=df.columns)

Timings

Method 4 (Alexander's) is the best followed by Method 3

enter image description here

174

answered Sep 29 '22 04:09

piRSquared

Related questions
                            
                                Attribute error Django REST serializing
                            
                                Install Poppler for Python on Mac
                            
                                How to provide temporary download url in Flask?
                            
                                How to Label patch in matplotlib
                            
                                Tkinter/ttk themed Message Box?
                            
                                skipping unknown number of lines to read the header python pandas
                            
                                Python 101: Can't open file: No such file or directory
                            
                                Dynamically change dockerrun.aws.json image tag on deploy
                            
                                Python Selenium binding with TOR browser
                            
                                How to share data between Python processes?
                            
                                Django - Custom permissions for function based views
                            
                                Accessing Meta Data from AWS S3 with AWS Lambda
                            
                                Python beginner, understanding some code
                            
                                Python the same char not equals
                            
                                Predicting the next word using the LSTM ptb model tensorflow example
                            
                                how to make pandas.read_sql() not convert all headers to lower case
                            
                                curses fails when calling addch on the bottom right corner
                            
                                How to reload image in ipython notebook?
                            
                                Override existing django-admin command
                            
                                Meaning of '>>' in Python byte code

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With