python - web scraping an ajax website using BeautifulSoup

Tags:

I am trying to scrape e-commerce site that uses ajax call to load its next pages.

I am able to scrape data present on page 1 but page 2 loads automatically through ajax call when I scroll page 1 to bottom.

My code :

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as ureq
my_url='http://www.shopclues.com/mobiles-smartphones.html'
page=ureq(my_url).read()
page_soup=soup(page,"html.parser")
containers=page_soup.findAll("div",{"class":"column col3"})
for container in containers:
   name=container.h3.text
   price=container.find("span",{'class':'p_price'}).text
   print("Name : "+name.replace(","," "))
   print("Price : "+price)
for i in range(2,7):
    my_url="http://www.shopclues.com/ajaxCall/moreProducts?catId=1431&filters=&pageType=c&brandName=&start="+str(36*(i-1))+"&columns=4&fl_cal=1&page="+str(i)
    page=ureq(my_url).read()
    print(page)
    page_soup=soup(page,"html.parser")
    containers=page_soup.findAll("div",{"class":"column col3"})
    for container in containers:
        name=container.h3.text
        price=container.find("span",{'class':'p_price'}).text
        print("Name : "+name.replace(","," "))
        print("Price : "+price)

I have printed the ajax page read by ureq to know whether I am able to open the ajax page and I got an output as: enter image description here

b' ' are the outputs of: print(page)

please provide me a solution to scrape the remaining data.

528

asked May 24 '17 17:05

sachin rathod

1 Answers

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as soup
from urllib2 import urlopen as ureq
import random
import time

chrome_options = webdriver.ChromeOptions()
prefs = {"profile.default_content_setting_values.notifications": 2}
chrome_options.add_experimental_option("prefs", prefs)

# A randomizer for the delay
seconds = 5 + (random.random() * 5)
# create a new Chrome session
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.implicitly_wait(30)
# driver.maximize_window()

# navigate to the application home page
driver.get("http://www.shopclues.com/mobiles-smartphones.html")
time.sleep(seconds)
time.sleep(seconds)
# Add more to range for more phones
for i in range(1):
    element = driver.find_element_by_id("moreProduct")
    driver.execute_script("arguments[0].click();", element)
    time.sleep(seconds)
    time.sleep(seconds)
html = driver.page_source
page_soup = soup(html, "html.parser")
containers = page_soup.findAll("div", {"class": "column col3"})
for container in containers:
# Add error handling
    try:
        name = container.h3.text
        price = container.find("span", {'class': 'p_price'}).text
        print("Name : " + name.replace(",", " "))
        print("Price : " + price)
    except AttributeError:
        continue
driver.quit()

I used selenium to load the website and click the button to load more results. Then take the resulting html and put in your code.

118

answered Nov 11 '22 21:11

ben taylor

Related questions
                            
                                Need a way to get python object from marshmallow load function instead of dictionary without using post_load decorator
                            
                                Error while installing Chalice
                            
                                Matplotlib: tick labels are inconsist with font setting (LaTeX text example)
                            
                                Element-wise multiplication in CVXPY
                            
                                how to open a menu programmatically in python tkinter?
                            
                                Type hint as logical-and of multiple types
                            
                                How to modify full text of some columns in pandas
                            
                                Is there a faster alternative to Python's strftime?
                            
                                How to make a Matplotlib animated violinplot?
                            
                                Python: How to fill out form all at once with splinter/Browser?
                            
                                How to import from sibling module in a package?
                            
                                pandas datetime set Sunday as first day of the week
                            
                                Object needs to have a value for field "id" before this many-to-many relationship can be used in Django
                            
                                Tkinter - Getting values from spinbox
                            
                                Convert CountVectorizer and TfidfTransformer Sparse Matrices into Separate Pandas Dataframe Rows
                            
                                How to add a legend to matplotlib scatter plot
                            
                                Google StackDrive Logging Level in containers with uwsgi always at Error Level
                            
                                Flask: session max size too small
                            
                                Statsmodels ARMA training data vs test data for prediction
                            
                                How to set a timeout for Input

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

python - web scraping an ajax website using BeautifulSoup

Tags:

python

ajax

beautifulsoup

web-scraping

sachin rathod

People also ask

1 Answers

ben taylor

Recent Activity

Donate For Us