I am scrapping the following page: https://proximity.niceic.com/mainform.aspx
First please enter '%%' in the country textbox to display all contractors in the area. Once I am in, if I inspect the HTML in the devtools I get the following:
I wanna extract all the info from the selected table. The problem is that when I scrap it using selenium I do find the table but I can't access its body or childs.
Here is my python code:
main_table = driver.find_elements_by_tag_name('table')
outer_table = main_table[3].find_element_by_tag_name('table')
print outer_table.get_attribute('innerHTML')
The code above outputs the following:
<table cellspacing="0" rules="all" bordercolor="Silver" border="1" id="dvContractorDetail" style="background-color:White;border-color:Silver;border-width:1px;border-style:Solid;height:200px;width:400px;border-collapse:collapse;">
</table>
As you can see I can only get the table tag but none of its components like tbody or all the tr tags in the tbody tag
What can I do?
What is happening here is that the table loads through JS after the page loads. You have to wait until the table loads. To do that, you'll have to use any one of the Waits
specified here.
I'll recommend using Explicit Wait
. You can do this:
First, you'll need to add the following imports.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
Then change
main_table = driver.find_elements_by_tag_name('table')
outer_table = main_table[3].find_element_by_tag_name('table')
print outer_table.get_attribute('innerHTML')
to
try:
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, 'gvContractors')))
except TimeoutException:
pass # Handle the exception here
table = driver.find_element_by_id('gvContractors').get_attribute('innerHTML')
print(table)
It'll give you the required output. I'm not posting the output here since it is too large, but you can verify it by doing this
print('Company/Address' in table)
which prints True
Note:
Instead of finding the tables one by one using _by_tag_name
you can directly use _by_id
to find the table you want. (Here the table has id="gvContractors"
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With