Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I activate each item and parse their information?

I came across a different type of problem while scraping a webpage using python. When an image is clicked, new information concerning its' flavor comes up under the image. My goal is to parse all the flavors connected to each image. My script can parse the flavors of currently active image but breaks after clicking on the new image. A little twitch in my loop will lead me to the right direction.

I've tried with:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://www.optigura.com/uk/product/gold-standard-100-whey/")
wait = WebDriverWait(driver, 10)

while True:
    items = wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='colright']//ul[@class='opt2']//label")))
    for item in items.find_elements_by_xpath("//div[@class='colright']//ul[@class='opt2']//label"):
        print(item.text)

    try:
        links = driver.find_elements_by_xpath("//span[@class='img']/img")
        for link in links:
            link.click()
    except:
        break

driver.quit() 

The picture underneath may clarify what i could not:

enter image description here

like image 393
SIM Avatar asked Oct 17 '22 08:10

SIM


2 Answers

I tweaked the code to properly click on the links and to check if the current listed item's text matches with the active listed item's text. If they match, you can safely go on parsing without worrying that you are parsing the same thing over and over again. Here you go:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://www.optigura.com/uk/product/gold-standard-100-whey/")
wait = WebDriverWait(driver, 10)
links = driver.find_elements_by_xpath("//span[@class='img']/img")

for idx, link in enumerate(links):
    while True:
        try:
            link.click()
            while driver.find_elements_by_xpath("//span[@class='size']")[idx].text != driver.find_elements_by_xpath("//div[@class='colright']//li[@class='active']//span")[1].text:
                link.click()
            print driver.find_elements_by_xpath("//span[@class='size']")[idx].text
            items = wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='colright']//ul[@class='opt2']//label")))
            for item in items.find_elements_by_xpath("//div[@class='colright']//ul[@class='opt2']//label"):
            print(item.text)
        except StaleElementReferenceException:
            continue
        break
driver.quit()
like image 156
Tehscript Avatar answered Oct 21 '22 05:10

Tehscript


I do not think it has much to do with Python, just many Javascript and ajax things.

enter image description here

The javascript part is

$(document).on("click", ".product-details .custom-radio input:not(.active input)", function() {
    var elm = $(this);
    var root = elm.closest(".product-details");
    var option = elm.closest(".custom-radio");
    var opt, opt1, opt2, ip, ipr;
    elm.closest("ul").find("li").removeClass("active");
    elm.closest("li").addClass("active");
    if (option.hasClass("options1")) {
        ip = root.find(".options1").data("ip");
        opt = root.find(".options2").data("opt");
        opt1 = root.find(".options1 li.active input").val();
        opt2 = root.find(".options2 li.active input").data("opt-sel");
    } else
        ipr = root.find(".options2 input:checked").val();
    $.ajax({
        type: "POST",
        url: "/product/ajax/details.php",
        data: {
            opt: opt,
            opt1: opt1,
            opt2: opt2,
            ip: ip,
            ipr: ipr
        },

So you can just construct the params(use css selector will be better than xpath in this case), post and parse the json results.

like image 23
aristotll Avatar answered Oct 21 '22 05:10

aristotll