I am trying to create a crawler to extract some attribute data from supplier websites that I can audit against our internal attribute database and am new to import.io. I watched a bunch of videos, but though my syntax seems to be right, my manual xpath override isn't returning attribute values. I have the following sample html code:
<table>
<tbody><tr class="oddRow">
<td class="label"> Adhesive Type‎</td><td> Epoxy‎
</td>
</tr>
<tr>
<td class="label"> Applications‎</td><td> Hard Disk Drive Component Assembly‎
</td>
</tr>
<tr class="oddRow">
<td class="label"> Brand‎</td><td> Scotch-Weld‎
</td>
</tr>
<tr>
<td class="label"> Capabilities‎</td><td> Sustainability‎
</td>
</tr>
<tr class="oddRow">
<td class="label"> Color‎</td><td> Clear Amber‎
</td>
I am trying to write an xpath following sibling statement to grab "Color" through an import.io crawler. The xpath code when I select "Color" is:
//*[@id="attributeList"]/table/tbody/tr[5]/td[1]
I've tried to use:
//*[@id="attributeList"]/table/tbody/tr/td[.="Color"]/following-sibling::td
But it isn't grabbing the color attribute value from the table. I'm not sure if it has something to do with the odd and even row classes? When I look at the html, it seems to make logical sense; color is "Color" and the attribute value is in the following td bracket.
The text in the selected td
node contains more than just "Color"
. It is Color‎
. So instead you could select td
nodes whose text contains the string "Color"
:
'//*[@id="attributeList"]/table/tbody/tr/td[contains(text(), "Color")]/following-sibling::td/text()'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With