Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy scraping nested text using css selectors

Tags:

python

css

scrapy

I have the following html code:

<div class='article'>
<p>Lorem <strong>ipsum</strong> si ammet</p>
</div>

So to get the text data as: Lorem ipsum si ammet, so I tried to use:

response.css('div.article >p::text ').extract() 

But I only receive only lorem sie ammet.

How can I get both <p> and <strong> texts using CSS selectors?

like image 787
Yurii Avatar asked Feb 23 '26 18:02

Yurii


1 Answers

One liner solution.

"".join(a.strip() for a in response.css("div.article *::text").extract())

div.article * means to scrape everything inside the div.article

Or an easy way to write it

text = ""
for a in response.css("div.article *::text").extract()
    text += a.strip()

Both approaches are same,

like image 122
Umair Ayub Avatar answered Feb 25 '26 08:02

Umair Ayub



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!