Scrapy: Why extracted strings are in this format?

Question

I'm doing

item['desc'] = site.select('a/text()').extract()

but this will be printed like this

[u'
                    A mano libera
                  ']

What must I do to tim and remove strange chars like [u' , the traling space and '] ?

I cannot trim (strip)

exceptions.AttributeError: 'list' object has no attribute 'strip'

and if converting to string and then stripping, the result was the string above, which I suppose to be in UTF-8

Capi Etheriel · Accepted Answer

There's a nice solution to this using Item Loaders. Item Loaders are objects that get data from responses, process the data and build Items for you. Here's an example of an Item Loader that will strip the strings and return the first value that matches the XPath, if any:

from scrapy.contrib.loader import XPathItemLoader
from scrapy.contrib.loader.processor import MapCompose, TakeFirst

class MyItemLoader(XPathItemLoader):
    default_item_class = MyItem
    default_input_processor = MapCompose(lambda string: string.strip())
    default_output_processor = TakeFirst()

And you use it like this:

def parse(self, response):
    loader = MyItemLoader(response=response)
    loader.add_xpath('desc', 'a/text()')
    return loader.load_item()

icecrime · Answer

The html page may very well contains these whitespaces characters.

What you retrieve a list of unicode strings, which is why you can't simply call strip on it. If you want to strip these whitespaces characters from each string in this list, you can run the following:

>>> [s.strip() for s in [u'
                    A mano libera
                  ']]
[u'A mano libera']

If only the first element matters to you, than simply do:

>>> [u'
                    A mano libera
                  '][0].strip()
u'A mano libera'

Scrapy: Why extracted strings are in this format?

Tags:

python

scrapy

realtebo

2 Answers

Capi Etheriel

icecrime

Recent Activity

Donate For Us

Scrapy: Why extracted strings are in this format?

Tags:

python

scrapy

realtebo

2 Answers

Capi Etheriel

icecrime

Related questions

Recent Activity

Donate For Us