Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split list data and populate to a DataFrame?

Tags:

python

I have a list of items and want to clean the data with certain conditions and the output is a dataframe. Here's the list:

[
  "Onion per Pack|500 g|Rp18,100|Rp3,700 / 100 g|Add to cart",
  "Shallot per Pack|250 g|-|49%|Rp22,300|Rp11,300|Rp4,600 / 100 g|Add to cart",
  "Spring Onion per Pack|250 g|Rp7,000|Rp2,800 / 100 g|Add to cart",
  "Green Beans per Pack|250 g|Rp5,900|Rp2,400 / 100 g|Add to cart",
  ]

into

name unit discount price unit price
Onion per Pack 500 g Rp18,100 Rp3,700 / 100 g
Shallot per Pack 250 g 49% Rp22,300 Rp11,300
Spring Onion per Pack 250 g Rp7,000 Rp2,800 / 100 g
Green Beans per Pack 250 g Rp5,900 Rp2,400 / 100 g

Currently my code is:

datas = pd.DataFrame()
for i in item:
    long = len(i.split("|"))
    if long == 5:
        data = {"name": i.split("|")[0]
                "unit": i.split("|")[2]
                "discount": ""
                "price": i.split("|")[3]
                "unit price": i.split("|")[4]}
        dat = pd.DataFrame(data)
        datas.append(dat)
    else:
        data = {"name": i.split("|")[0]
                "unit": i.split("|")[2]
                "discount": i.split("|")[4]
                "price": i.split("|")[6]
                "unit price": i.split("|")[7]}
        dat = pd.DataFrame(data)
        datas.append(dat)

Is there a more efficient way? A shorter way to achieve this?

like image 659
Hal Avatar asked Jan 18 '26 07:01

Hal


1 Answers

Once the source data has been cleaned (preferably by the provider) and each field is defined - ensuring an equal number of fields through the dataset - the following very simple approach can be used to populate the DataFrame:

Data:

cols = ['name', 'unit', 'discount', 'price', 'unit_price', 'other']

# Fields are defined by placing a 'double delimiter' indicating empty fields.
items = ["Onion per Pack|500 g||Rp18,100|Rp3,700 / 100 g|Add to cart",
         "Shallot per Pack|250 g|49%|Rp22,300|Rp4,600 / 100 g|Add to cart",
         "Spring Onion per Pack|250 g||Rp7,000|Rp2,800 / 100 g|Add to cart",
         "Green Beans per Pack|250 g||Rp5,900|Rp2,400 / 100 g|Add to cart"]

Population:

The cleaned source data can be populated directly into the DataFrame via the data parameter. In the case below, a 'generator expression' is used to iterate the dataset efficiently and split on the field delimiter.

The next statement removed the additional column, which is not to be included in the output.

df = pd.DataFrame(data=(i.split('|') for i in items), columns=cols)
df.drop('other', axis=1, inplace=True)

Output:

                    name   unit discount     price       unit_price
0         Onion per Pack  500 g           Rp18,100  Rp3,700 / 100 g
1       Shallot per Pack  250 g      49%  Rp22,300  Rp4,600 / 100 g
2  Spring Onion per Pack  250 g            Rp7,000  Rp2,800 / 100 g
3   Green Beans per Pack  250 g            Rp5,900  Rp2,400 / 100 g
like image 181
S3DEV Avatar answered Jan 20 '26 21:01

S3DEV