Storing the results of Web Scraping into Database

Tags:

I have written a code for web scraping using python. The code extracts data of Macbook from amazon using selenium. Now I want to store these values in a Excel or MySql. There are various html/css class in a particular product row and one parent class which includes all the parameters of the product. To be Precise the code is:

Click to copy

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import xlwt 
from xlwt import Workbook 
option = webdriver.ChromeOptions()
option.add_argument("--incognito")
browser = webdriver.Chrome(executable_path='/home/mukesh/Desktop/backup/Programminghub/whatsapp_python_scripts/chromedriver_linux64/chromedriver', chrome_options=option)
# go to website of interest
browser.get("https://www.amazon.in/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=macbook")
# wait up to 10 seconds for page to load
timeout = 10
try:
    WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.XPATH, "//img[@class='s-access-image cfMarker']")))
except TimeoutException:
    print("Timed out waiting for page to load")
    browser.quit()

titles_element = browser.find_elements_by_xpath("//div[@class='s-item-container']")
titles = []
for x in titles_element:
    value=x.text
    value=value.encode('ascii', 'ignore')
    titles.append(value)
print(titles)

Now the output that I get is highly unstructured and contains some parameters which are there only on certain products. For instance the Parameter: "Maximum Resolution" or "CPU model manufacture" are present only on certain laptops and not on all.I don't want such parameters.I want only these parameters: Product name(Title of the row), Price,Operating System,Cpu model family,computer memory size and display size which are present on all the laptops. I am unable to split the titles list in these sub list. I tried a foolish approach as well where I was able to split the products by accessing the individual classes of every parameters but then it didn't match up to correct values. Price of some other laptop was shown on some other plus sponsored ads caused problems in it. Link of website:Amazon Macbook Scraping I just want these parameters either in my list or excel or Mysql database : Product name(Title of the row), Price,Operating System,Cpu model family,computer memory size and display size(6 columns)

976

asked Sep 29 '18 19:09

mozilla-firefox

1 Answers

Well, you have 2 different problems here as I see it:

Fetching all the details you want for every item and putting them into a data structure.
Saving that data with a DB or an Excel file (CSV for example).

So Let's assume all your Interested to get about a product is it's name and price (just for the matter of explaining), we'll create a simple class called Product:

Click to copy

class Product(object):
    def __init__(self, name, price):
        self.name = name
        self.price = price

And then, for every item you find, we will get it's price and name, and create an instance of product:

Click to copy

titles_element = browser.find_elements_by_xpath("//div[@class='s-item-container']")
products = []
for x in titles_element:
    try:
        name = x.find_element_by_class_name("s-access-title").text
        price = x.find_element_by_class_name("s-price").text
        products.append(Product(name, price))
    except WebDriverException:
        pass

You can, of course, get any other data that you are interested at getting using the right CSS/Xpath selector or even Regular expressions.

After that, you will have the data you want and it will be much easier to save using a DB, JSON, CSV or any other kind of data storing you'd like, let's take a look at saving that data to a CSV file for example:

Click to copy

import csv

def save_products_to_csv_file(product_list, file_name):
    with open(file_name, 'wb') as csvfile:
        fieldnames = ['name', 'price']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()

        for product in products:
            writer.writerow({'name': product.name, 'price': product.price})

And here is another example of storing your data into a SQLite DB using SQLAlchemy:

Click to copy

from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy import Column, String

Base = declarative_base()


# Creating a DB model class that reprsents our Product object.
class Product(Base):
    __tablename__ = 'products'

    # Here we define columns for the product
    id = Column(Integer, primary_key=True)
    name = Column(String)
    price = Column(String)


engine = create_engine('sqlite:///sqlalchemy_example.db')
Base.metadata.create_all(engine)


Session = sessionmaker(bind=engine)
session = Session()

titles_element = browser.find_elements_by_xpath("//div[@class='s-item-container']")
for x in titles_element:
    try:
        name = x.find_element_by_class_name("s-access-title").text
        price = x.find_element_by_class_name("s-price").text
        new_product = Product(name=name, price=price)
        session.add(new_product)
    except WebDriverException:
        pass

session.commit()

112

answered Sep 23 '22 06:09

Moshe

Related questions
                            
                                ReduceLROnPlateau gives error with ADAM optimizer
                            
                                BigQuery - Best way to DROP date-sharded tables
                            
                                Tensorflow adam optimizer in Keras
                            
                                How to use io to generate in memory data streams as file like objects?
                            
                                Access axes object in seaborn lmplot [duplicate]
                            
                                Fuzzy Match columns of Different Dataframe
                            
                                Exact inverse of pandas' "pivot" operation
                            
                                Any way to change color bar (cbar) in seaborn to a legend (for a binary heatmap)?
                            
                                Targeting a specific metric to optimize in tensorflow
                            
                                How to assign custom color to masked cells in seaborn heatmap?
                            
                                Pandas: Groupby and iterate with conditionals within groups?
                            
                                "PACKAGES DO NOT MATCH THE HASHES" error with pip
                            
                                Sigmoid function returns 1 for large positive inputs
                            
                                ModuleNotFoundError: No module named 'pip.download' when trying to install Python package for Django
                            
                                How to set requests 'user-agent' header globally
                            
                                Python xarray remove coordinates with all missing variables
                            
                                Run custom task when call `pip install`
                            
                                How to implement Backus-Naur Form in Python
                            
                                Zip single file
                            
                                How can I flatten lists without splitting strings?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Storing the results of Web Scraping into Database

Tags:

python

selenium

selenium-webdriver

web-scraping

web-crawler

mozilla-firefox

People also ask

1 Answers

Moshe

Recent Activity

Donate For Us