How to get immediate parent node with scrapy in python?

Question

I am new to scrapy. I want to crawl some data from the web. I got the html document like below.

dom style1:
<div class="user-info">
    <p class="user-name">
        something in p tag
    </p>
    text data I want
</div>

dom style2:
<div class="user-info">
    <div>
        <p class="user-img">
            something in p tag
        </p>
        something in div tag
    </div>
    <div>
        <p class="user-name">
            something in p tag
        </p>
        text data I want
    </div>
</div>

I want to get the data text data I want, now I can use css or xpath selector to get it by check it exists. But I want to know some better ways. For example, I can get css p.user-name first, and then I get it's parent, and then I get it's div/text(), and always the data I want is the text() of the p.user-name's immediate parent div, but the question is, how can I get the immediate parent p.user-name?

Granitosaurus · Accepted Answer

With xpath you can traverse the xml tree in every direction(parent, sibling, child etc.) where css doesn't support this.
For your case you can get node's parent with xpath .. parent notation:

//p[@class='user-name']/../text()

Explanation:
//p[@class='user-name'] - find <p> nodes with class value user-name.
/.. - select node's parent.
/text() - select text of the current node.

This xpath should work in both of your described cases.

How to get immediate parent node with scrapy in python?

Tags:

python

xpath

scrapy

web-crawler

parent-child

Simon

1 Answers

Granitosaurus

Recent Activity

Donate For Us

How to get immediate parent node with scrapy in python?

Tags:

python

xpath

scrapy

web-crawler

parent-child

Simon

1 Answers

Granitosaurus

Related questions

Recent Activity

Donate For Us