Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Beautiful Soup find elements having hidden style

My simple need. How do I find elements that are not visible on the webpage currently? I am guessing style="visibility:hidden" or style="display:none" are simple ways to hide an element, but BeautifulSoup doesn't know if its hidden or not.

For example, HTML is:

Textbox_Invisible1: <input id="tbi1" type="text" style="visibility:hidden">
Textbox_Invisible2: <input id="tbi2" type="text" class="hidden_elements">
Textbox1: <input id="tb1" type="text">

So my first concern is that BeautifulSoup cannot find out if any of the above textboxes are hidden:

# Python 2.7
# Import BeautifulSoup
>>> source = """Textbox_Invisible1: <input id="tbi1" type="text" style="visibility:hidden">
...  Textbox_Invisible2: <input id="tbi2" type="text" class="hidden_elements">
...  Textbox1: <input id="tb1" type="text">"""
>>> soup1 = BeautifulSoup(source)
>>> soup1.find(id='tb1').hidden
False
>>> soup1.find(id='tbi1').hidden
False
>>> soup1.find(id='tbi2').hidden
False
>>> 

My only question is, is there a way to find out which elements are hidden? (We have to consider the complex HTML also where the having elements might be hidden)

like image 760
amulllb Avatar asked Oct 09 '22 04:10

amulllb


2 Answers

BeautifulSoup is an html parser, not a browser. It doesn't know anything about how the page is supposed to be rendered, calculated DOM attributes etc, it's checking where the angle brackets begin and end.

If you need to work with the DOM at runtime, you'd be better off with a browser automation package, i.e. something that will start the browser, let the browser consume the page, and then expose browser controls and the calculated DOM. Depending on the platform, you have different options. Have a look at this page on the Python WIki for ideas, check the section Python Wrappers around Web "Libraries" and Browser Technology.

like image 125
Giacomo Lacava Avatar answered Oct 13 '22 18:10

Giacomo Lacava


With BeautifulSoup, I'm afraid you'll need to explicitly check the attributes used to make the elements hidden:

soup = BeautifulSoup(source)
tbi1 = soup.find(id='tbi1')
tbi2 = soup.find(id='tbi2')
print tbi1['style'] == 'visibility:hidden'
print tbi2['class'] == 'hidden_elements'
like image 2
jcollado Avatar answered Oct 13 '22 18:10

jcollado