I'm using selenium
with Python 2.7. to retrieve the contents from a search box on a webpage. The search box dynamically retrieves and displays the results in the box itself.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd
import re
from time import sleep
driver = webdriver.Firefox()
driver.get(url)
df = pd.read_csv("read.csv")
def crawl(isin):
searchkey = driver.find_element_by_name("searchkey")
searchkey.clear()
searchkey.send_keys(isin)
sleep(11)
search_result = driver.find_element_by_class_name("ac_results")
names = re.match(r"^.*(?=(\())", search_result.text).group().encode("utf-8")
product_id = re.findall(r"((?<=\()[0-9]*)", search_result.text)
return pd.Series([product_id, names])
df[["insref", "name"]] = df["ISIN"].apply(crawl)
print df
Relevant part of the code may be found under def crawl(isin):
searchkey
). sleep()
and waits for the content to show in the search box dropdown field ac_results
.insrefs
and names
with Regex.Instead of calling sleep()
, I would like for it to wait for the content in the WebElement ac_results
to load.
Since it will continuously use the search box to get new data by entering new search terms from a list, one could perhaps use Regex to identify when there is new content in ac_results
that is not identical to the previous content.
Is there a method for this? It is important to note that the content in the search box is dynamically loaded, so the function would have to recognise that something has changed in the WebElement.
You need to apply the Explicit Wait concept. E.g. wait for an element to become visible:
wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'searchbox')))
Here, it would wait up to 10 seconds checking the visibility of the element every 500 ms.
There is a set of built-in Expected Conditions to wait for and it is also easy to write your custom Expected Condition.
FYI, here is how we approached it after brainstorming it in the chat. We've introduced a custom Expected Condition that would wait for the element text to change. It helped us to identify when the new search results appear:
import re
import pandas as pd
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import _find_element
class text_to_change(object):
def __init__(self, locator, text):
self.locator = locator
self.text = text
def __call__(self, driver):
actual_text = _find_element(driver, self.locator).text
return actual_text != self.text
#Load URL
driver = webdriver.Firefox()
driver.get(url)
#Load DataFrame of terms to search for
df = pd.read_csv("searchkey.csv")
#Crawling function
def crawl(searchkey):
try:
text_before = driver.find_element_by_class_name("ac_results").text
except NoSuchElementException:
text_before = ""
searchbox = driver.find_element_by_name("searchbox")
searchbox.clear()
searchbox.send_keys(searchkey)
print "\nSearching for %s ..." % searchkey
WebDriverWait(driver, 10).until(
text_to_change((By.CLASS_NAME, "ac_results"), text_before)
)
search_result = driver.find_element_by_class_name("ac_results")
if search_result.text != "none":
names = re.match(r"^.*(?=(\())", search_result.text).group().encode("utf-8")
insrefs = re.findall(r"((?<=\()[0-9]*)", search_result.text)
if search_result.text == "none":
names = re.match(r"^.*(?=(\())", search_result.text)
insrefs = re.findall(r"((?<=\()[0-9]*)", search_result.text)
return pd.Series([insrefs, names])
#Run crawl
df[["Insref", "Name"]] = df["ISIN"].apply(crawl)
#Print DataFrame
print df
I suggest using the below Expected Condition in WebDriverWait.
WebDriverWait(driver, 10).until(
text_to_be_present_in_element((By.CLASS_NAME, "searchbox"), r"((?<=\()[0-9]*)")
)
or
WebDriverWait(driver, 10).until(
text_to_be_present_in_element_value((By.CLASS_NAME, "searchbox"), r"((?<=\()[0-9]*)")
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With