Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Scrapy can't extract text from class

Please look this html code:

<header class="online">
                        <img src="http://static.flv.com/themes/h5/img/iconos/online.png"> <span>online</span> 
            <img src="http://static.flv.com/themes/h5/img/iconos/ojo16.png"> 428                        <p>xxfantasia</p>
</header>

I want to get the text inside (428, in this case). I used this:

        def parse(self, response):
            sel = Selector(response)
            cams = sel.css('header.online')
            for cam in cams:
                  print cam.css('text').extract()

I think i have used the correct css selector, but i got empty result.

Any help?

like image 499
buly Avatar asked Feb 05 '14 11:02

buly


1 Answers

CSS selectors don't normally have syntax to extract text content.

But Scrapy extends CSS selectors with the ::text pseudo-element, so you want to use cam.css('::text').extract() that should give you the same thing as cam.xpath('.//text()').extract()

Note: Scrapy also adds the ::attr(attribute_name) functional pseudo-element to extract attribute value (that's also not possible with standard CSS selectors)

like image 126
paul trmbrth Avatar answered Oct 29 '22 17:10

paul trmbrth