I am using Python with BeautifulSoup4 and I need to retrieve visible links on the page. Given this code:
soup = BeautifulSoup(html)
links = soup('a')
I would like to create a method is_visible that checks whether or not a link is displayed on the page.
Since I am working also with Selenium I know that there exist the following solution:
from selenium.webdriver import Firefox
firefox = Firefox()
firefox.get('https://google.com')
links = firefox.find_elements_by_tag_name('a')
for link in links:
if link.is_displayed():
print('{} => Visible'.format(link.text))
else:
print('{} => Hidden'.format(link.text))
firefox.quit()
Unfortunately the is_displayed method and getting the text attribute perform a http request to retrieve such informations. Therefore things can get really slow when there are many links on a page or when you have to do this multiple times.
On the other hand BeautifulSoup can perform these parsing operations in zero time once you get the page source. But I can't figure out how to do this.
Mostly the hidden elements are defined by the CSS property style="display:none;". In case an element is a part of the form tag, it can be hidden by setting the attribute type to the value hidden. Selenium by default cannot handle hidden elements and throws ElementNotVisibleException while working with them.
We can verify whether an element is present or visible in a page with Selenium webdriver. To check the presence of an element, we can use the method – findElements. The method findElements returns a list of matching elements. Then, we have to use the method size to get the number of items in the list.
verifyElementPresent is a command which is used to check the presence of a certain element.
Below is the syntax which is used for checking that an element with text is either invisible or not present on the DOM. WebDriverWait wait = new WebDriverWait(driver, waitTime); wait. until(ExpectedConditions. invisibilityOfElementWithText(by, strText));
AFAIK, BeautifulSoup will only help you parse the actual markup of the HTML document anyway. If that's all you need, then you can do it in a manner like so (yes, I already know it's not perfect):
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc)
def is_visible_1(link):
#do whatever in this function you can to determine your markup is correct
try:
style = link.get('style')
if 'display' in style and 'none' in style:#or use a regular expression
return False
except Exception:
return False
return True
def is_visible_2(**kwargs):
try:
soup = kwargs.get('soup', None)
del kwargs['soup']
#Exception thrown if element can't be found using kwargs
link = soup.find_all(**kwargs)[0]
style = link.get('style')
if 'display' in style and 'none' in style:#or use a regular expression
return False
except Exception:
return False
return True
#checks links that already exist, not *if* they exist
for link in soup.find_all('a'):
print(str(is_visible_1(link)))
#checks if an element exists
print(str(is_visible_2(soup=soup,id='someID')))
BeautifulSoup doesn't take into account other parties that will tell you that the element is_visible or not, like: CSS, Scripts, and dynamic DOM changes. Selenium, on the other hand, does tell you that an element is actually being rendered or not and generally does so through accessibility APIs in the given browser. You must decide if sacrificing accuracy for speed is worth pursuing. Good luck! :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With