Splitting a cell in pandas into multiple rows

Question

It was a bit tricky to explain the problem. I want to split a cell containing multiple string values delimited by commas into different rows. The df below is a small example but the real dataset contains up to 15 columns and 15 rows and each cell has 5 to 6 non unique string values in it that I need to separate into different rows.

How can we split the original df to the transformed df?

Original df

import pandas as pd
df = pd.DataFrame({"Privileges":['Type1','Type2','Type3'],"Super_Admin":["A1,A2,A3,A4","A1,B1,B2,B3, A4","C1,B2,C2,C3"], "Admin":["A1,A2","A1,B1,B2","B2, C1,C2"])

Index	Privileges	Super_Admin	Admin
0	Type1	A1,A2,A3,A4	A1,A2
1	Type2	A1,B1,B2,B3, A4	A1,B1,B2
2	Type3	C1,B2,C2,C3	B2, C1,C2

Transformed df

df = pd.DataFrame({"Privileges":['Type1','Type1','Type1','Type1','Type2','Type2','Type2','Type2','Type2','Type3','Type3','Type3','Type3'],"Super_Admin":["A1","A2","A3","A4","A1", "B1","B2","B3", "A4","C1","B2","C2","C3"], "Admin":["A1","A2",'', '',"A1","B1","B2",'', '', "B2", "C1","C2", '']})

Index	Privileges	Super_Admin	Admin
0	Type1	A1	A1
1	Type1	A2	A2
2	Type1	A3	--
3	Type1	A4	--
4	Type2	A1	A1
5	Type2	B1	B1
6	Type2	B2	B2
7	Type2	B3	--
8	Type2	A4	--
9	Type3	C1	B2
10	Type3	B2	C1
11	Type3	C2	C2
12	Type3	C3	--

sammywemmy · Accepted Answer

Breakdown of the steps below:

zip all the columns

Use zip_longest to pair None with values that do not have a pair

Combine the two lists into one, with chain

Create dataframe and forward fill on the Privileges column

In [541]: step1 = zip(df.Privileges, df.Super_Admin, df.Admin)

In [542]: step2 = (zip_longest([first], second,last) 
                   for first, second, last 
                   in step1)

In [543]: step3 = chain.from_iterable(step2)

In [546]: (pd.DataFrame(step3, columns = df.columns)
             .assign(Privileges = lambda df: df.Privileges.ffill())
           )
Out[546]: 
   Privileges Super_Admin Admin
0       Type1          A1    A1
1       Type1          A2    A2
2       Type1          A3  None
3       Type1          A4  None
4       Type2          A1    A1
5       Type2          B1    B1
6       Type2          B2    B2
7       Type2          B3  None
8       Type2          A4  None
9       Type3          C1    B2
10      Type3          B2    C1
11      Type3          C2    C2
12      Type3          C3  None

To get more speed, you can move the split step into native python territory. pandas string methods are wrappers around python's string functions, as such they are not as fast as python's string functions.

mosc9575 · Answer

This is on option. First make Super_Admin and Admin to a list. This is useful to use pd.explode().

df['Super_Admin'] = df['Super_Admin'].apply(lambda x: x.split(','))
df['Admin'] = df['Admin'].apply(lambda x: x.split(','))

Then call explode on both columns and fill the missing values with an empty string.

a = df.explode('Super_Admin')
b = df.explode('Admin')
for i in range(3):
    short = b.loc[i,'Admin'].values
    long = a.loc[i,'Admin'].values
    a.loc[i,'Admin'] = np.concatenate((short, ['']*(len(long)-len(short))), axis=0)

The output looks like this:

>>> a
  Privileges Super_Admin Admin
0      Type1          A1    A1
0      Type1          A2    A2
0      Type1          A3      
0      Type1          A4      
1      Type2          A1    A1
1      Type2          B1    B1
1      Type2          B2    B2
1      Type2          B3      
1      Type2          A4      
2      Type3          C1    B2
2      Type3          B2    C1
2      Type3          C2    C2
2      Type3          C3

Splitting a cell in pandas into multiple rows

Tags:

python

pandas

dataframe

data-science

data-cleaning

user14318465

2 Answers

sammywemmy

mosc9575

Recent Activity

Donate For Us

Splitting a cell in pandas into multiple rows

Tags:

python

pandas

dataframe

data-science

data-cleaning

user14318465

2 Answers

sammywemmy

mosc9575

Related questions

Recent Activity

Donate For Us