Using Pandas 0.25.3, trying to explode a couple of columns.
Data looks like:
d1 = {'user':['user1','user2','user3','user4'],
'paid':['Y','Y','N','N']
'last_active':['11 Jul 2019','23 Sep 2018','08 Dec 2019','03 Mar 2018'],
'col4':'data'}
I sent this to a dataframe df=pd.DataFrame([d1],columns=d1.keys())
that looks like this:
user paid last_active col4
['user1','user2','user3','user4'] ['Y','Y','N','N'] ['11 Jul 2019','23 Sep 2018','08 Dec 2019','03 Mar 2018'] 'data'
there are other columns as well with one value per, {'A':'B'}
type stuff, but I'm not worried about those.
when I do df.explode('user')
it works for that one, and same for the other columns, but when I try to do df.explode(column=('user','paid','last_active')
it gives me the following error:
KeyError: ('user','paid','last_active')
So what I want to know, is how can I explode it with the explode
function on multiple columns to get the following df:
user paid last_active col4
'user1' 'Y' '11 Jul 2019' 'data'
'user2' 'Y' '23 Sep 2018' NaN
'user3' 'N' '08 Dec 2019' NaN
'user4' 'N' '03 Mar 2018' NaN
Column(s) to explode. For multiple columns, specify a non-empty list with each element be str or tuple, and all specified columns their list-like data on same row of the frame must have matching length. If True, the resulting index will be labeled 0, 1, …, n - 1. New in version 1.1.
If you need to remove multiple columns from your dataset, you can either . pop() multiple times, or use pandas . drop() instead.
Pandas DataFrame: explode() functionThe explode() function is used to transform each element of a list-like to a row, replicating the index values. Exploded lists to rows of the subset columns; index will be duplicated for these rows. Raises: ValueError - if columns of the frame are not unique.
I guess you need (note the difference in data for col4
which has None
as OP mentioned):
pd.DataFrame([[i] if not isinstance(i,list) else i
for i in d1.values()],index=d1.keys()).T
user paid last_active col4
0 user1 Y 11 Jul 2019 data
1 user2 Y 23 Sep 2018 None
2 user3 N 08 Dec 2019 None
3 user4 N 03 Mar 2018 None
Pandas does not have a multi-column explode. There are workarounds. One such simple way could be:
df = pd.DataFrame(
{
'A': [1, 2],
'B': [['a','b'], ['c','d']],
'C': [['z','y'], ['x','w']]
}
)
print(df)
--------------
A B C
--------------
1 [a, b] [z, y]
2 [c, d] [x, w]
##Let us say list_cols are the columns to be exploded
list_cols = {'B','C'}
other_cols = list(set(df.columns) - set(list_cols))
##other_cols now contains all the remaining column names in the df
##we temporarily convert to set() to easily get the differences in 2 lists
##now explode the list_cols using a loop
exploded = [df[col].explode() for col in list_cols]
##now we have long list of exploded values. Print to see the format
##This statement creates pairs of the exploded cols
##zip command is used to create the pairs
##dict puts it in an appropriate format from which a dataframe can be created
##Please print the individual outputs of each command to understand the flow
df2 = pd.DataFrame(dict(zip(list_cols, exploded)))
##Now merge back the other_cols as well
df2 = df[other_cols].merge(df2, how="right", left_index=True, right_index=True)
##lastly, re-create the original column order
df2 = df2.loc[:, df.columns]
print(df2)
------
A B C
------
1 a z
1 b y
2 c x
2 d w
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With