Extract href in table with importxml in Google spreadsheet

Question

I am trying to pull the href for each row of each table from this website:

http://www.epa.gov/region4/superfund/sites/sites.html#KY

I can pull the table information off using =IMPORTHTML(A1,"table",1) for all 7 tables, but I need the href to the site with the detailed information.

Using =IMPORTxml(A1,"//div[@class='box']") I can pull the information needed from a site like:

http://www.epa.gov/region4/superfund/sites/fedfacs/alarmyaplal.html

but I need to extract the fedfacs/alarmyaplal.html portion for each row on the original page.

I've tried using //@href, but it is not returning any results. I'm thinking it is because the data is structured in a table but I'm stuck on where to go from here.

Zach Young · Accepted Answer

I'm not sure about any of the Google Spreadsheet functionality, but here's an XPath to select all href attributes of the Kentucky sites (since your first link included the 'ky' anchor):

//body//a[@id='ky']/following-sibling::table[1]/tbody/tr/td[1]/strong/a/@href

This is very specific to the Kentucky table: following-sibling::table[1] means the first table node after, and at the same level of, a[@id='ky'].

Extract href in table with importxml in Google spreadsheet

Tags:

google-sheets

xpath

Slocke04

1 Answers

Zach Young

Recent Activity

Donate For Us

Extract href in table with importxml in Google spreadsheet

Tags:

google-sheets

xpath

Slocke04

1 Answers

Zach Young

Related questions

Recent Activity

Donate For Us