Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get HTML table body in Python using Selenium

I am scrapping the following page: https://proximity.niceic.com/mainform.aspx

First please enter '%%' in the country textbox to display all contractors in the area. Once I am in, if I inspect the HTML in the devtools I get the following:

Chrome Devtools

I wanna extract all the info from the selected table. The problem is that when I scrap it using selenium I do find the table but I can't access its body or childs.

Here is my python code:

main_table = driver.find_elements_by_tag_name('table')
outer_table = main_table[3].find_element_by_tag_name('table')
print outer_table.get_attribute('innerHTML')

The code above outputs the following:

<table cellspacing="0" rules="all" bordercolor="Silver" border="1" id="dvContractorDetail" style="background-color:White;border-color:Silver;border-width:1px;border-style:Solid;height:200px;width:400px;border-collapse:collapse;">

</table>

As you can see I can only get the table tag but none of its components like tbody or all the tr tags in the tbody tag

What can I do?

like image 538
Ian Spitz Avatar asked Oct 16 '22 23:10

Ian Spitz


1 Answers

What is happening here is that the table loads through JS after the page loads. You have to wait until the table loads. To do that, you'll have to use any one of the Waits specified here.

I'll recommend using Explicit Wait. You can do this:

First, you'll need to add the following imports.

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException

Then change

main_table = driver.find_elements_by_tag_name('table')
outer_table = main_table[3].find_element_by_tag_name('table')
print outer_table.get_attribute('innerHTML') 

to

try:
    WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, 'gvContractors')))
except TimeoutException:
    pass  # Handle the exception here
table = driver.find_element_by_id('gvContractors').get_attribute('innerHTML')
print(table)

It'll give you the required output. I'm not posting the output here since it is too large, but you can verify it by doing this

print('Company/Address' in table)

which prints True

Note:
Instead of finding the tables one by one using _by_tag_name you can directly use _by_id to find the table you want. (Here the table has id="gvContractors")

like image 81
Keyur Potdar Avatar answered Oct 21 '22 08:10

Keyur Potdar