I came across a different type of problem while scraping a webpage using python. When an image is clicked, new information concerning its' flavor comes up under the image. My goal is to parse all the flavors connected to each image. My script can parse the flavors of currently active image but breaks after clicking on the new image. A little twitch in my loop will lead me to the right direction.
I've tried with:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://www.optigura.com/uk/product/gold-standard-100-whey/")
wait = WebDriverWait(driver, 10)
while True:
items = wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='colright']//ul[@class='opt2']//label")))
for item in items.find_elements_by_xpath("//div[@class='colright']//ul[@class='opt2']//label"):
print(item.text)
try:
links = driver.find_elements_by_xpath("//span[@class='img']/img")
for link in links:
link.click()
except:
break
driver.quit()
The picture underneath may clarify what i could not:
I tweaked the code to properly click on the links and to check if the current listed item's text matches with the active listed item's text. If they match, you can safely go on parsing without worrying that you are parsing the same thing over and over again. Here you go:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://www.optigura.com/uk/product/gold-standard-100-whey/")
wait = WebDriverWait(driver, 10)
links = driver.find_elements_by_xpath("//span[@class='img']/img")
for idx, link in enumerate(links):
while True:
try:
link.click()
while driver.find_elements_by_xpath("//span[@class='size']")[idx].text != driver.find_elements_by_xpath("//div[@class='colright']//li[@class='active']//span")[1].text:
link.click()
print driver.find_elements_by_xpath("//span[@class='size']")[idx].text
items = wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='colright']//ul[@class='opt2']//label")))
for item in items.find_elements_by_xpath("//div[@class='colright']//ul[@class='opt2']//label"):
print(item.text)
except StaleElementReferenceException:
continue
break
driver.quit()
I do not think it has much to do with Python, just many Javascript and ajax things.
The javascript part is
$(document).on("click", ".product-details .custom-radio input:not(.active input)", function() {
var elm = $(this);
var root = elm.closest(".product-details");
var option = elm.closest(".custom-radio");
var opt, opt1, opt2, ip, ipr;
elm.closest("ul").find("li").removeClass("active");
elm.closest("li").addClass("active");
if (option.hasClass("options1")) {
ip = root.find(".options1").data("ip");
opt = root.find(".options2").data("opt");
opt1 = root.find(".options1 li.active input").val();
opt2 = root.find(".options2 li.active input").data("opt-sel");
} else
ipr = root.find(".options2 input:checked").val();
$.ajax({
type: "POST",
url: "/product/ajax/details.php",
data: {
opt: opt,
opt1: opt1,
opt2: opt2,
ip: ip,
ipr: ipr
},
So you can just construct the params(use css selector will be better than xpath in this case), post and parse the json results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With