I am trying to pull the href for each row of each table from this website:
http://www.epa.gov/region4/superfund/sites/sites.html#KY
I can pull the table information off using =IMPORTHTML(A1,"table",1) for all 7 tables, but I need the href to the site with the detailed information.
Using =IMPORTxml(A1,"//div[@class='box']") I can pull the information needed from a site like:
http://www.epa.gov/region4/superfund/sites/fedfacs/alarmyaplal.html
but I need to extract the fedfacs/alarmyaplal.html portion for each row on the original page.
I've tried using //@href, but it is not returning any results. I'm thinking it is because the data is structured in a table but I'm stuck on where to go from here.
I'm not sure about any of the Google Spreadsheet functionality, but here's an XPath to select all href attributes of the Kentucky sites (since your first link included the 'ky' anchor):
//body//a[@id='ky']/following-sibling::table[1]/tbody/tr/td[1]/strong/a/@href
This is very specific to the Kentucky table: following-sibling::table[1] means the first table node after, and at the same level of, a[@id='ky'].
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With