scrapy xpath : choose the ancestor node

Tags:

I have a question about xpath

    <div id="A" >
    <div class="B">
        <div class="C">
            <div class="item">
                <div class="area">
                    <div class="sec">USA</div>
                    <table>
                        <tbody> 
                            <tr>
                                <td><a href="">D1</a></td>
                                <td>D2</td>
                            </tr>
                            <tr class="even">
                                <td><a href="">E1</a></td>
                                <td>E2</td>
                            </tr>
                        </tbody>
                    </table>
                </div>
                <div class="area">
                    <div class="sec">UK</div>
                    <table>
                        <tbody> 
                            <tr>
                                <td><a href="">F1</a></td>
                                <td>F2</td>
                            </tr>
                        </tbody>
                    </table>
                </div>
            </div>
        </div>>
    </div>
 </div>

My code is:

sel = Selector(response)
group = sel.xpath("//div[@id='A']/div[@class='B']/div[@class='C']/div[@class='item']/div[@class='area']/table/tbody/tr")
for g in group:
    # section = g.xpath("").extract()  #ancestor???
    context = g.xpath("./td[1]/a/text()").extract()
    brief = g.xpath("./td[2]/text()").extract()
    # print section[0]
    print context[0]
    print brief[0]

it will print:

D1
D2
E1
E2
F1
F2

But I want to print :

USA
D1
D2
USA
E1
E2
UK
F1
F2

So I need to choose the value of the parent node so I can get USA and UK
I can't figure it out for a while.
Please teach me thank you!

310

asked Oct 23 '14 08:10

user2492364

1 Answers

In XPath, you can traverse backwards a tree with .. , so a selector like this could work for you:

section = g.xpath('../../../div[@class="sec"]/text()').extract()

Although this would work, it heavily depends on the exact document structure you have. If you need a bit more flexibility, to say allow minor structural changes to the document, you could search backwards for an ancestor like this:

section = g.xpath('ancestor::div[@class="area"]/div[@class="sec"]/text()').extract()

169

answered Sep 20 '22 21:09

andrean

Related questions
                            
                                How to *not* display 'NaN' in ipython notebook (html table of pandas dataframe)?
                            
                                Unintended multithreading in Python (scikit-learn)
                            
                                Python range( ) is not giving me a list [duplicate]
                            
                                Why does object.__new__ work differently in these three cases
                            
                                Numpy array of random matrices
                            
                                How to quickly encrypt a password string in Django without an User Model?
                            
                                What is the Matlab equivalent of the yield keyword in Python?
                            
                                How to decode a Base64 string in Scala or Java?
                            
                                How to preprocess data for machine learning? [closed]
                            
                                NameError: name '__main__' is not defined [closed]
                            
                                Why doesn't list.reverse return a list?
                            
                                Combining logic statements AND in numpy array
                            
                                Using python's urllib.quote_plus on utf-8 strings with 'safe' arguments
                            
                                Return None when attribute does not exist
                            
                                Python grouping elements in a list in increasing size
                            
                                HeartBleed python test script
                            
                                What makes lists unhashable?
                            
                                MongoDB group with multiple id
                            
                                installing pandas on python - where did numpy go?
                            
                                How to make a fixed-size byte variable in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

scrapy xpath : choose the ancestor node

Tags:

python

xpath

scrapy

user2492364

People also ask

1 Answers

andrean

Recent Activity

Donate For Us