How to split list data and populate to a DataFrame?

Question

I have a list of items and want to clean the data with certain conditions and the output is a dataframe. Here's the list:

[
  "Onion per Pack|500 g|Rp18,100|Rp3,700 / 100 g|Add to cart",
  "Shallot per Pack|250 g|-|49%|Rp22,300|Rp11,300|Rp4,600 / 100 g|Add to cart",
  "Spring Onion per Pack|250 g|Rp7,000|Rp2,800 / 100 g|Add to cart",
  "Green Beans per Pack|250 g|Rp5,900|Rp2,400 / 100 g|Add to cart",
  ]

into

name	unit	discount	price	unit price
Onion per Pack	500 g		Rp18,100	Rp3,700 / 100 g
Shallot per Pack	250 g	49%	Rp22,300	Rp11,300
Spring Onion per Pack	250 g		Rp7,000	Rp2,800 / 100 g
Green Beans per Pack	250 g		Rp5,900	Rp2,400 / 100 g

Currently my code is:

datas = pd.DataFrame()
for i in item:
    long = len(i.split("|"))
    if long == 5:
        data = {"name": i.split("|")[0]
                "unit": i.split("|")[2]
                "discount": ""
                "price": i.split("|")[3]
                "unit price": i.split("|")[4]}
        dat = pd.DataFrame(data)
        datas.append(dat)
    else:
        data = {"name": i.split("|")[0]
                "unit": i.split("|")[2]
                "discount": i.split("|")[4]
                "price": i.split("|")[6]
                "unit price": i.split("|")[7]}
        dat = pd.DataFrame(data)
        datas.append(dat)

Is there a more efficient way? A shorter way to achieve this?

S3DEV · Accepted Answer

Once the source data has been cleaned (preferably by the provider) and each field is defined - ensuring an equal number of fields through the dataset - the following very simple approach can be used to populate the DataFrame:

Data:

cols = ['name', 'unit', 'discount', 'price', 'unit_price', 'other']

# Fields are defined by placing a 'double delimiter' indicating empty fields.
items = ["Onion per Pack|500 g||Rp18,100|Rp3,700 / 100 g|Add to cart",
         "Shallot per Pack|250 g|49%|Rp22,300|Rp4,600 / 100 g|Add to cart",
         "Spring Onion per Pack|250 g||Rp7,000|Rp2,800 / 100 g|Add to cart",
         "Green Beans per Pack|250 g||Rp5,900|Rp2,400 / 100 g|Add to cart"]

Population:

The cleaned source data can be populated directly into the DataFrame via the data parameter. In the case below, a 'generator expression' is used to iterate the dataset efficiently and split on the field delimiter.

The next statement removed the additional column, which is not to be included in the output.

df = pd.DataFrame(data=(i.split('|') for i in items), columns=cols)
df.drop('other', axis=1, inplace=True)

Output:

                    name   unit discount     price       unit_price
0         Onion per Pack  500 g           Rp18,100  Rp3,700 / 100 g
1       Shallot per Pack  250 g      49%  Rp22,300  Rp4,600 / 100 g
2  Spring Onion per Pack  250 g            Rp7,000  Rp2,800 / 100 g
3   Green Beans per Pack  250 g            Rp5,900  Rp2,400 / 100 g

How to split list data and populate to a DataFrame?

Tags:

python

Hal

1 Answers

S3DEV

Recent Activity

Donate For Us

How to split list data and populate to a DataFrame?

Tags:

python

Hal

1 Answers

S3DEV

Related questions

Recent Activity

Donate For Us