Split columns base on conditions in Python

Question

I have a dataset where there are multiple values in one column, the problem is there might be some null values in these columns. And I need to create three different columns from this column where the number of characters is not fixed nor the position.

The data before:

df=pd.DataFrame({'Date':['2-18-2019','2-18-2019','2-19-2019','2-19-2019','2-20-2019','2-21-2019','2-21-2019','2-22-2019'],'Item':['NY01','Ld01','Du02','Du01','Ps55','L55','Du85','L85'],'SizeAgeQuantity':['13 3/8 5 846','4 1/2 557 85','9 5/8 47 4464','30 58','32 304 304','32 304 304 ','7 6588 685','4118 587','29']})


   Date    |    Item    |    SizeAgeQuantity
2-18-2019  |    NY01    |     13 3/8 5 846         
2-18-2019  |    Ld01    |     4 1/2 557 85        
2-19-2019  |    Du02    |     9 5/8 47 4464         
2-19-2019  |    Du01    |         30 58      
2-20-2019  |    Ps55    |     32 304 304      
2-21-2019  |    L55     |     7  6588 685  
2-21-2019  |    Du85    |        4118 587       
2-22-2019  |    L85     |        29

The result I am looking for is like this:

   Date    |    Item    |    Size    |    Age   |   Quantity
2-18-2019  |    NY01    |   13 3/8   |     5    |     846         
2-18-2019  |    Ld01    |    4 1/2   |    557   |     85        
2-19-2019  |    Du02    |    9 5/8   |    47    |     4464         
2-19-2019  |    Du01    |     30     |    58    |  
2-20-2019  |    Ps55    |     32     |    304   |     304      
2-21-2019  |    L55     |     7      |    6588  |     685  
2-21-2019  |    Du85    |            |    4118  |     587       
2-22-2019  |    L85     |            |    29    |

The only consistent thing is the column "Size" should only have a value from the following set ("4 1/2","7", "9 5/8", "13 3/8", "18", "30", "32")

I have tried the following code: df['Size'], df['FrakS'], df['Age'], df['Quantity'] = df['SizeAgeQuantity'].str.split(' ', 3).str

But the result comes as the following:

   Date    |    Item    |    Size    |   FrakS   |    Age   |   Quantity
2-18-2019  |    NY01    |     13     |    3/8    |     5    |     846         
2-18-2019  |    Ld01    |     4      |    1/2    |    557   |     85        
2-19-2019  |    Du02    |     9      |    5/8    |    47    |     4464         
2-19-2019  |    Du01    |     30     |    58     |          |  
2-20-2019  |    Ps55    |     32     |    304    |    304   |           
2-21-2019  |    L55     |     7      |    658    |    685   |       
2-21-2019  |    Du85    |    4118    |    587    |          |          
2-22-2019  |    L85     |     29     |           |          |

If anyone can help me I would really appreciate it

Timo · Accepted Answer

This should do the trick:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Date':['2-18-2019','2-18-2019','2-19-2019','2-19-2019','2-20-2019','2-21-2019','2-21-2019','2-22-2019'],
    'Item':['NY01','Ld01','Du02','Du01','Ps55','L55','Du85','L85'],
    'SizeAgeQuantity':['13 3/8 5 846','4 1/2 557 85','9 5/8 47 4464','30 58','32 304 304','7 6588 685','4118 587','29']})

size_list = ["4 1/2", "7", "9 5/8", "13 3/8", "18", "30", "32"]

columns = []

for row in df.SizeAgeQuantity:
    values = row.split()
    # if there's a "/" in the row, 
    # combine values 1 and 2
    if "/" in row:
        size = " ".join(values[:2])
        del values[0:2]
        values.insert(0, size)
    
    # add nan padding to the values list
    values = values + [np.nan] * (3-len(values))
    
    # if the 1st value is not size, shift list right
    if values[0] not in size_list:
        values = values[-1:] + values[:-1]
    
    columns.append(values)

saq = pd.DataFrame(columns, columns=["Size", "Age", "Quantity"])
out = pd.concat([df.drop("SizeAgeQuantity", axis=1), saq], axis=1)
print(out)

Out:

        Date  Item    Size   Age Quantity
0  2-18-2019  NY01  13 3/8     5      846
1  2-18-2019  Ld01   4 1/2   557       85
2  2-19-2019  Du02   9 5/8    47     4464
3  2-19-2019  Du01      30    58      NaN
4  2-20-2019  Ps55      32   304      304
5  2-21-2019   L55       7  6588      685
6  2-21-2019  Du85     NaN  4118      587
7  2-22-2019   L85     NaN    29      NaN

Split columns base on conditions in Python

Tags:

python

pandas

dataframe

SMO

1 Answers

Timo

Recent Activity

Donate For Us

Split columns base on conditions in Python

Tags:

python

pandas

dataframe

SMO

1 Answers

Timo

Related questions

Recent Activity

Donate For Us