I have a dataset where there are multiple values in one column, the problem is there might be some null values in these columns. And I need to create three different columns from this column where the number of characters is not fixed nor the position.
The data before:
df=pd.DataFrame({'Date':['2-18-2019','2-18-2019','2-19-2019','2-19-2019','2-20-2019','2-21-2019','2-21-2019','2-22-2019'],'Item':['NY01','Ld01','Du02','Du01','Ps55','L55','Du85','L85'],'SizeAgeQuantity':['13 3/8 5 846','4 1/2 557 85','9 5/8 47 4464','30 58','32 304 304','32 304 304 ','7 6588 685','4118 587','29']})
Date | Item | SizeAgeQuantity
2-18-2019 | NY01 | 13 3/8 5 846
2-18-2019 | Ld01 | 4 1/2 557 85
2-19-2019 | Du02 | 9 5/8 47 4464
2-19-2019 | Du01 | 30 58
2-20-2019 | Ps55 | 32 304 304
2-21-2019 | L55 | 7 6588 685
2-21-2019 | Du85 | 4118 587
2-22-2019 | L85 | 29
The result I am looking for is like this:
Date | Item | Size | Age | Quantity
2-18-2019 | NY01 | 13 3/8 | 5 | 846
2-18-2019 | Ld01 | 4 1/2 | 557 | 85
2-19-2019 | Du02 | 9 5/8 | 47 | 4464
2-19-2019 | Du01 | 30 | 58 |
2-20-2019 | Ps55 | 32 | 304 | 304
2-21-2019 | L55 | 7 | 6588 | 685
2-21-2019 | Du85 | | 4118 | 587
2-22-2019 | L85 | | 29 |
The only consistent thing is the column "Size" should only have a value from the following set ("4 1/2","7", "9 5/8", "13 3/8", "18", "30", "32")
I have tried the following code: df['Size'], df['FrakS'], df['Age'], df['Quantity'] = df['SizeAgeQuantity'].str.split(' ', 3).str
But the result comes as the following:
Date | Item | Size | FrakS | Age | Quantity
2-18-2019 | NY01 | 13 | 3/8 | 5 | 846
2-18-2019 | Ld01 | 4 | 1/2 | 557 | 85
2-19-2019 | Du02 | 9 | 5/8 | 47 | 4464
2-19-2019 | Du01 | 30 | 58 | |
2-20-2019 | Ps55 | 32 | 304 | 304 |
2-21-2019 | L55 | 7 | 658 | 685 |
2-21-2019 | Du85 | 4118 | 587 | |
2-22-2019 | L85 | 29 | | |
If anyone can help me I would really appreciate it
This should do the trick:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Date':['2-18-2019','2-18-2019','2-19-2019','2-19-2019','2-20-2019','2-21-2019','2-21-2019','2-22-2019'],
'Item':['NY01','Ld01','Du02','Du01','Ps55','L55','Du85','L85'],
'SizeAgeQuantity':['13 3/8 5 846','4 1/2 557 85','9 5/8 47 4464','30 58','32 304 304','7 6588 685','4118 587','29']})
size_list = ["4 1/2", "7", "9 5/8", "13 3/8", "18", "30", "32"]
columns = []
for row in df.SizeAgeQuantity:
values = row.split()
# if there's a "/" in the row,
# combine values 1 and 2
if "/" in row:
size = " ".join(values[:2])
del values[0:2]
values.insert(0, size)
# add nan padding to the values list
values = values + [np.nan] * (3-len(values))
# if the 1st value is not size, shift list right
if values[0] not in size_list:
values = values[-1:] + values[:-1]
columns.append(values)
saq = pd.DataFrame(columns, columns=["Size", "Age", "Quantity"])
out = pd.concat([df.drop("SizeAgeQuantity", axis=1), saq], axis=1)
print(out)
Out:
Date Item Size Age Quantity
0 2-18-2019 NY01 13 3/8 5 846
1 2-18-2019 Ld01 4 1/2 557 85
2 2-19-2019 Du02 9 5/8 47 4464
3 2-19-2019 Du01 30 58 NaN
4 2-20-2019 Ps55 32 304 304
5 2-21-2019 L55 7 6588 685
6 2-21-2019 Du85 NaN 4118 587
7 2-22-2019 L85 NaN 29 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With