I am trying to scrape all the links from a web page. I am using Selenium WebDriver to scroll and click the load more button present in the web page. The code which I am trying is as shown below:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import ElementNotVisibleException
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
from bs4 import BeautifulSoup
def fetch_links(url):
chrome_path = r"D:\nishant_pc_d_drive\nishant_pc\d_drive\update_engine\myntra_update\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get(url)
while True:
try:
scrollcount=1
while scrollcount<5:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
WebDriverWait(driver, 5)
scrollcount+=1
WebDriverWait(driver, 10).until(EC.presence_of_element_located(driver.find_elements_by_css_selector('.load_more .sbt-button, .load_more_order .sbt-button')))
driver.find_element_by_id("loadmore").click()
except (ElementNotVisibleException,NoSuchElementException) as e:
print "done"
x = driver.page_source
soup2 = BeautifulSoup(x, 'html.parser')
linkcount=0
for each in soup2.find_all('a',attrs={"class":"thumb searchUrlClass"}):
print "https://www.shoppersstop.com/"+each.get('href')
linkcount+=1
print linkcount
# thumb searchUrlClass
fetch_links("https://www.shoppersstop.com/women-westernwear-tops-tees/c-A206020")
But unfortunately it is giving me an error, as shown below:
Traceback (most recent call last):
File "D:/INVENTORY/shopperstop/fetch_link.py", line 36, in <module>
fetch_links("https://www.shoppersstop.com/women-westernwear-tops-tees/c-A206020")
File "D:/INVENTORY/shopperstop/fetch_link.py", line 21, in fetch_links
WebDriverWait(driver, 10).until(EC.presence_of_element_located(driver.find_element_by_class_name('sbt-button')))
File "C:\Python27\lib\site-packages\selenium\webdriver\support\wait.py", line 71, in until
value = method(self._driver)
File "C:\Python27\lib\site-packages\selenium\webdriver\support\expected_conditions.py", line 63, in __call__
return _find_element(driver, self.locator)
File "C:\Python27\lib\site-packages\selenium\webdriver\support\expected_conditions.py", line 328, in _find_element
return driver.find_element(*by)
TypeError: find_element() argument after * must be an iterable, not WebElement
How can I fix this error? Thanks!
New Selenium IDE This type of exception is thrown when there is no element on the page which matches with the locator value. Check if there is any syntax error in our xpath expression. Add additional expected wait conditions for the element. Use an alternative xpath expression.
New Selenium IDE If an element is not found in an HTML DOM using xpath, then the NoSuchElementException is raised. This exception is thrown when the webdriver makes an attempt to locate a web element which is absent from DOM. This is normally encountered if we create an incorrect xpath for an element.
To fix this, we can either apply explicit wait so that the webdriver waits for the expected condition - invisibilityOfElementLocated of the overlaying webelement. Or, we can apply the expected condition - elementToBeClickable on the webelement that we want to interact with.
getText() Method in Selenium This method helps retrieve the text, which is basically the innertext of a WebElement. getText() method returns string as a result. It removes the whitespaces if present in the front and back of the string.
The error text is legitimately confusing.
Basically, some Expected Conditions (EC) methods use locator
s, while some use element
s. The one you used only accepts a locator
, but you provided an element
instead.
The difference is sort of explained in the Selenium API docs here:
element is a WebElement object.
locator is a tuple of (by, path).
A practical example of a locator
is (By.ID, 'someid')
(You'll need to import Selenium's "By" class)
So, here's the initial code that incorrectly provides an element:
WebDriverWait(driver, 10).until(
EC.presence_of_element_located(driver.find_element_by_class_name('sbt-button'))
)
It should be updated to provide a locator instead:
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, 'sbt-button'))
)
Notice the double parens. That's a tuple being passed to the EC method.
Note: In your case, it also looks like you want multiple elements, so you also need to use EC.presence_of_all_elements_located()
instead of EC.presence_of_element_located()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With