I have an HTML file (from Newegg) and their HTML is organized like below. All of the data in their specifications table is 'desc' while the titles of each section are in 'name.' Below are two examples of data from Newegg pages. <pre class="prettyprint"><code><tr> <td class="name">Brand</td> <td class="desc">Intel</td> </tr> <tr> <td class="name">Series</td> <td class="desc">Core i5</td> </tr> <tr> <td class="name">Cores</td> <td class="desc">4</td> </tr> <tr> <td class="name">Socket</td> <td class="desc">LGA 1156</td> </code></pre> <h3> </h3> <pre class="prettyprint"><code><tr> <td class="name">Brand</td> <td class="desc">AMD</td> </tr> <tr> <td class="name">Series</td> <td class="desc">Phenom II X4</td> </tr> <tr> <td class="name">Cores</td> <td class="desc">4</td> </tr> <tr> <td class="name">Socket</td> <td class="desc">Socket AM3</td> </tr> </code></pre> In the end I would like to have a class for a CPU (which is already set up) that consists of a Brand, Series, Cores, and Socket type to store each of the data. This is the only way I can think of to go about doing this: <pre class="prettyprint"><code>if(parsedDocument.xpath(tr/td[@class="name"])=='Brand'): CPU.brand = parsedDocument.xpath(tr/td[@class="name"]/nextsibling?).text </code></pre> And doing this for the rest of the values. How would I accomplish the nextsibling and is there an easier way of doing this?

<blockquote> How would I accomplish the nextsibling and is there an easier way of doing this? </blockquote> You may use: <pre class="prettyprint"><code>tr/td[@class='name']/following-sibling::td </code></pre> but I'd rather use directly: <pre class="prettyprint"><code>tr[td[@class='name'] ='Brand']/td[@class='desc'] </code></pre> This assumes that: <ol> <li>The context node, against which the XPath expression is evaluated is the parent of all <code>tr</code> elements -- not shown in your question.</li> <li>Each <code>tr</code> element has only one <code>td</code> with <code>class</code> attribute valued <code>'name'</code> and only one <code>td</code> with <code>class</code> attribute valued <code>'desc'</code>.</li> </ol>

Try the <code>following-sibling</code> axis (<code>following-sibling::td</code>).

For completeness - adding to accepted answer above - in case you are interested in any sibling regardless of the element type you can use variation: <code>following-sibling::*</code>

How to select following sibling/XML tag using XPath

Tags:

xml

xpath

lxml

I have an HTML file (from Newegg) and their HTML is organized like below. All of the data in their specifications table is 'desc' while the titles of each section are in 'name.' Below are two examples of data from Newegg pages.

<tr>
    <td class="name">Brand</td>
    <td class="desc">Intel</td>
</tr>
<tr>
    <td class="name">Series</td>
    <td class="desc">Core i5</td>
</tr>
<tr>
    <td class="name">Cores</td>
    <td class="desc">4</td>
</tr>
<tr>
    <td class="name">Socket</td>
    <td class="desc">LGA 1156</td>

<tr>
    <td class="name">Brand</td>
    <td class="desc">AMD</td>
</tr>
<tr>
    <td class="name">Series</td>
    <td class="desc">Phenom II X4</td>
</tr>
<tr>
    <td class="name">Cores</td>
    <td class="desc">4</td>
</tr>
<tr>
    <td class="name">Socket</td>
    <td class="desc">Socket AM3</td>
</tr>

In the end I would like to have a class for a CPU (which is already set up) that consists of a Brand, Series, Cores, and Socket type to store each of the data. This is the only way I can think of to go about doing this:

if(parsedDocument.xpath(tr/td[@class="name"])=='Brand'):
    CPU.brand = parsedDocument.xpath(tr/td[@class="name"]/nextsibling?).text

And doing this for the rest of the values. How would I accomplish the nextsibling and is there an easier way of doing this?

718

asked Jun 29 '10 09:06

Corey Farwell

3 Answers

How would I accomplish the nextsibling and is there an easier way of doing this?

You may use:

tr/td[@class='name']/following-sibling::td

but I'd rather use directly:

tr[td[@class='name'] ='Brand']/td[@class='desc']

This assumes that:

The context node, against which the XPath expression is evaluated is the parent of all tr elements -- not shown in your question.
Each tr element has only one td with class attribute valued 'name' and only one td with class attribute valued 'desc'.

answered Oct 16 '22 20:10

Dimitre Novatchev

Try the following-sibling axis (following-sibling::td).

answered Oct 16 '22 18:10

Philipp

For completeness - adding to accepted answer above - in case you are interested in any sibling regardless of the element type you can use variation:

following-sibling::*

answered Oct 16 '22 18:10

Milan

Related questions
                            
                                The entity name must immediately follow the '&' in the entity reference
                            
                                Is there a standard naming convention for XML elements? [closed]
                            
                                XML Validation with XSD in Visual Studio IDE
                            
                                Convert an object to an XML string
                            
                                SVG rounded corner
                            
                                Which is best way to define constants in android, either static class, interface or xml resource?
                            
                                Using StringWriter for XML Serialization
                            
                                What are the benefits of dependency injection containers?
                            
                                XML serialization in Java? [closed]
                            
                                Case insensitive XPath contains() possible?
                            
                                Best way to encode text data for XML in Java?
                            
                                The following classes could not be instantiated: - android.support.v7.widget.Toolbar
                            
                                How to make RatingBar to show five stars
                            
                                How to capitalize the first letter of text in a TextView in an Android Application
                            
                                When would I use XML instead of SQL? [closed]
                            
                                Apply Material Design Touch Ripple to ImageButton?
                            
                                How do I load an org.w3c.dom.Document from XML in a string?
                            
                                How do I reduce the inner padding around the text within an Android button object?
                            
                                Are there any free Xml Diff/Merge tools available? [closed]
                            
                                Node.js: how to consume SOAP XML web service

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With