Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract numbers from a multi line string in python 3

Tags:

python-3.x

I have been struggling with the task of getting all numbers out of a multiline string called price (product is ok).

Im using python to scrape a website for product name and price which results in below output and written like that to the file:

Master C141,"

6

                    999

                        .
                        -
                "
Master 220,"

6

                    499

                        .
                        -
                "
Master C170,"

12

                    499
                        .
                        -
                "

I have tried many different code examples from Stackoverflow and several other sites but none have worked. What I would like to accomplish is an output like below:

Master C141, 6999

Master 220, 6499

Master C170, 12499

Here is the code:

content = driver.page_source

products=[] #List to store name of the product
prices=[] #List to store price of the product

soup = BeautifulSoup(content,"html.parser")
for a in soup.findAll('div', attrs={'class':'c-product-listing__col'}):
    name=a.find('h2', attrs={'class':'c-product-card__heading'})
    price=a.find('div', attrs={'class':'c-price-tag__price'})

    print(re.findall("\d+", price.text))
    
    products.append(name.text)
    prices.append(price.text)

df = pd.DataFrame({'Product Name':products,'Price':prices}) 
df.to_csv('products.txt', index=False, encoding='utf-8')
like image 701
Jiess Avatar asked Mar 27 '26 01:03

Jiess


2 Answers

This answer assumes we are starting with the text in your question:

output = re.sub(r'\b(Master \w+,).*?(\d+).*?(\d+).*?(?=\bMaster|$)', r'\1 \2\3\n', text, flags=re.S).strip()
print(output)

This prints:

Master C141, 6999
Master 220, 6499
Master C170, 12499

Here we are just capturing the Master term along with the two digits following it, and then combining to generate the output you want. Note that we use the dot all flag so we can match content across lines.

like image 188
Tim Biegeleisen Avatar answered Apr 02 '26 22:04

Tim Biegeleisen


Ok, problem solved. Thanks to those who helped me in the right direction. The code might not be optimal, but it works :)

.....
.....
content = driver.page_source

products=[] #List to store name of the product
prices=[] #List to store price of the product
ratings=[] #List to store rating of the product

soup = BeautifulSoup(content,"html.parser")
for a in soup.findAll('div', attrs={'class':'c-product-listing__col'}):
    name=a.find('h2', attrs={'class':'c-product-card__heading'})
    price=a.find('div', attrs={'class':'c-price-tag__price'})
    strProduct = name.text
    strPrice = price.text
    
    strProduct = re.match('[^,]+', strProduct)[0]
    strPrice = re.sub('\D', '', strPrice)
    
    products.append(strProduct)
    prices.append(strPrice)

df = pd.DataFrame({'Product Name':products,'Price':prices}) 
df.to_csv('products.csv', index=False, encoding='utf-8')
driver.quit()
like image 30
Jiess Avatar answered Apr 02 '26 22:04

Jiess



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!