Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating new rows in dataframe based on string values in multiple columns

I ran into this problem where I have a dataframe that looks like the following (the values in the last 3 columns are usually 4-5 alphanumeric codes).

import pandas as pd

data = {'ID':['P39','S32'],
        'Name':['Pipe','Screw'],
        'Col3':['Test1, Test2, Test3','Test6, Test7'],
        'Col4':['','Test8, Test9'],
        'Col5':['Test4, Test5','Test10, Test11, Test12, Test13']
       }

df = pd.DataFrame(data)
ID Name Col3 Col4 Col5
0 P39 Pipe Test1, Test2, Test3 Test4, Test5
1 S32 Screw Test6, Test7 Test8, Test9 Test10, Test11, Test12, Test13

I want to expand this dataframe or create a new one based on the values in the last 3 columns in each row. I want to create more rows based on the maximum amount of values separated by commas in one of the last 3 rows. I then want to keep the first 2 columns the same in all of the expanded rows. But I want to fill the last 3 columns in the expanded rows with only one value each from the original column.

In the above example, the first row would indicate I need 3 total rows (Col3 has the most at 3 values), and the second row would indicate I need 4 total rows (Col5 has the most at 4 values). A desired output would be along the lines of:

ID Name Col3 Col4 Col5
0 P39 Pipe Test1 Test4
1 P39 Pipe Test2 Test5
2 P39 Pipe Test3
3 S32 Screw Test6 Test8 Test10
4 S32 Screw Test7 Test9 Test11
5 S32 Screw Test12
6 S32 Screw Test13

I first found a way to figure out the number of rows needed. I also had the idea to append the values to a new dataframe in the same loop. Although, I'm not sure how to separate the values in the last 3 columns and append them one by one in the rows. I know the str.split() is useful to put the values into a list. My only idea would be if I need to loop through each column separately and append it to the correct row, but I'm not sure how to do that.

output1 = pd.DataFrame(
    columns = ['ID', 'Name', 'Col3', 'Col4', 'Col5'])

for index, row in df.iterrows():
    
    output2 = pd.DataFrame(
        columns = ['ID', 'Name', 'Col3', 'Col4', 'Col5'])

    col3counter = df.iloc[index, 2].count(',')
    col4counter = df.iloc[index, 3].count(',')
    col5counter = df.iloc[index, 4].count(',')
    
    numofnewcols = max(col3counter, col4counter, col5counter) + 1

    iter1 = df.iloc[index, 2].split(', ')
    iter2 = df.iloc[index, 3].split(', ')
    iter3 = df.iloc[index, 4].split(', ')

    #for q in iter1
        #output2.iloc[ , 2] = 
    

    output1 = pd.concat([output1, output2], ignore_index=True)
    del output2
like image 829
PyGuy66 Avatar asked May 31 '26 04:05

PyGuy66


1 Answers

Here is a way:

cols = ['Col3','Col4','Col5']

s = df[cols].stack().str.split(', ')
s2 = s.str.len().groupby(level=0).transform(lambda x: x.max() - x)
df.loc[:,~df.columns.isin(cols)].join((s + s2.map(lambda x: x * [''])).unstack()).explode(cols).reset_index(drop=True)

Here is another way using .stack() str.split() and creating a new df using the output:

cols = ['Col3','Col4','Col5']

s = df[cols].stack().str.split(',')
(df[['ID','Name']].join(pd.DataFrame(s.tolist(),index = s.index)
.stack()
.unstack(level=1)
.droplevel(1)
.fillna('')))

Output:

    ID   Name   Col3   Col4    Col5
0  P39   Pipe  Test1          Test4
1  P39   Pipe  Test2          Test5
2  P39   Pipe  Test3               
3  S32  Screw  Test6  Test8  Test10
4  S32  Screw  Test7  Test9  Test11
5  S32  Screw                Test12
6  S32  Screw                Test13
like image 162
rhug123 Avatar answered Jun 01 '26 18:06

rhug123