Using Pandas 0.25.3, trying to explode a couple of columns. Data looks like: <pre class="prettyprint lang-py prettyprint-override"><code>d1 = {'user':['user1','user2','user3','user4'], 'paid':['Y','Y','N','N'] 'last_active':['11 Jul 2019','23 Sep 2018','08 Dec 2019','03 Mar 2018'], 'col4':'data'} </code></pre> I sent this to a dataframe <code>df=pd.DataFrame([d1],columns=d1.keys())</code> that looks like this: <pre class="prettyprint lang-py prettyprint-override"><code>user paid last_active col4 ['user1','user2','user3','user4'] ['Y','Y','N','N'] ['11 Jul 2019','23 Sep 2018','08 Dec 2019','03 Mar 2018'] 'data' </code></pre> there are other columns as well with one value per, <code>{'A':'B'}</code> type stuff, but I'm not worried about those. when I do <code>df.explode('user')</code> it works for that one, and same for the other columns, but when I try to do <code>df.explode(column=('user','paid','last_active')</code> it gives me the following error: <code>KeyError: ('user','paid','last_active')</code> So what I want to know, is how can I explode it with the <code>explode</code> function on multiple columns to get the following df: <pre class="prettyprint lang-py prettyprint-override"><code>user paid last_active col4 'user1' 'Y' '11 Jul 2019' 'data' 'user2' 'Y' '23 Sep 2018' NaN 'user3' 'N' '08 Dec 2019' NaN 'user4' 'N' '03 Mar 2018' NaN </code></pre>

I guess you need (note the difference in data for <code>col4</code> which has <code>None</code> as OP mentioned): <pre class="prettyprint"><code>pd.DataFrame([[i] if not isinstance(i,list) else i for i in d1.values()],index=d1.keys()).T </code></pre> <hr> <pre class="prettyprint"><code> user paid last_active col4 0 user1 Y 11 Jul 2019 data 1 user2 Y 23 Sep 2018 None 2 user3 N 08 Dec 2019 None 3 user4 N 03 Mar 2018 None </code></pre>

Pandas Explode on Multiple columns

Tags:

python

pandas

dataframe

explode

Using Pandas 0.25.3, trying to explode a couple of columns.

Data looks like:

d1 = {'user':['user1','user2','user3','user4'],
      'paid':['Y','Y','N','N']
      'last_active':['11 Jul 2019','23 Sep 2018','08 Dec 2019','03 Mar 2018'],
      'col4':'data'}

I sent this to a dataframe df=pd.DataFrame([d1],columns=d1.keys()) that looks like this:

user                              paid              last_active                                                col4               
['user1','user2','user3','user4'] ['Y','Y','N','N'] ['11 Jul 2019','23 Sep 2018','08 Dec 2019','03 Mar 2018']  'data'

there are other columns as well with one value per, {'A':'B'} type stuff, but I'm not worried about those.

when I do df.explode('user') it works for that one, and same for the other columns, but when I try to do df.explode(column=('user','paid','last_active') it gives me the following error:

KeyError: ('user','paid','last_active')

So what I want to know, is how can I explode it with the explode function on multiple columns to get the following df:

user     paid  last_active    col4
'user1'  'Y'   '11 Jul 2019'  'data'
'user2'  'Y'   '23 Sep 2018'  NaN
'user3'  'N'   '08 Dec 2019'  NaN
'user4'  'N'   '03 Mar 2018'  NaN

844

asked Dec 17 '19 15:12

nos codemos

2 Answers

I guess you need (note the difference in data for col4 which has None as OP mentioned):

pd.DataFrame([[i] if not isinstance(i,list) else i 
             for i in d1.values()],index=d1.keys()).T

    user paid  last_active  col4
0  user1    Y  11 Jul 2019  data
1  user2    Y  23 Sep 2018  None
2  user3    N  08 Dec 2019  None
3  user4    N  03 Mar 2018  None

130

answered Sep 29 '22 08:09

anky

Pandas does not have a multi-column explode. There are workarounds. One such simple way could be:

df = pd.DataFrame(
    {
        'A': [1, 2],
        'B': [['a','b'], ['c','d']],
        'C': [['z','y'], ['x','w']]
    }
)
print(df)

--------------
A    B     C
--------------
1 [a, b] [z, y]
2 [c, d] [x, w]

##Let us say list_cols are the columns to be exploded
list_cols = {'B','C'}

other_cols = list(set(df.columns) - set(list_cols))
##other_cols now contains all the remaining column names in the df
##we temporarily convert to set() to easily get the differences in 2 lists

##now explode the list_cols using a loop
exploded = [df[col].explode() for col in list_cols]
##now we have long list of exploded values. Print to see the format

##This statement creates pairs of the exploded cols
##zip command is used to create the pairs
##dict puts it in an appropriate format from which a dataframe can be created
##Please print the individual outputs of each command to understand the flow
df2 = pd.DataFrame(dict(zip(list_cols, exploded)))

##Now merge back the other_cols as well
df2 = df[other_cols].merge(df2, how="right", left_index=True, right_index=True)

##lastly, re-create the original column order
df2 = df2.loc[:, df.columns]

print(df2)

------
A B C
------
1 a z
1 b y
2 c x
2 d w

answered Sep 29 '22 10:09

Allohvk

Related questions
                            
                                python: Invalid base64-encoded string: number of data characters (5) cannot be 1 more than a multiple of 4
                            
                                How to fix 'Install tornado itself to use zmq with the tornado IOLoop.' warning in Python
                            
                                pandas.factorize with custom array datatype
                            
                                Worker process crashes on requests.get() when data is put into input queue before the worker process starts
                            
                                How to show a histogram of percentages instead of counts using Altair
                            
                                ContextVars across modules
                            
                                Is string internally stored as individual characters, each character in memory shared by other similar strings?
                            
                                How to emulate file opened in text mode in Python
                            
                                Nbconvert doesn't display styler dataframe from jupyter notebook
                            
                                Condition statement without loops
                            
                                Do separate Anaconda environments install the same package twice, taking up twice the storage?
                            
                                Python - define constant inside function
                            
                                Comma operator precedence
                            
                                Error: class uri 'eventlet' invalid or not found
                            
                                dtypes muck things up when shifting on axis one (columns)
                            
                                merge two dataframes and add column level with names
                            
                                Colab finishes with a ^C
                            
                                Django server stops immediatly after login into admin page
                            
                                Connected components from an adjacency matrix using Numpy or Scipy
                            
                                What is the standard exception for a missing value in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With