I am learning scrapy with the tutorial: http://doc.scrapy.org/en/1.0/intro/tutorial.html

When I run the following example script in the tutorial. I found that even though it was already looping through the selector list, the tile I got from sel.xpath('a/text()').extract() was still a list, which contained one string. Like [u'Python 3 Object Oriented Programming'] rather than u'Python 3 Object Oriented Programming'. In a later example the list is assigned to item as item['title'] = sel.xpath('a/text()').extract(), which I think is not logically correct.

import scrapy

class DmozSpider(scrapy.Spider):
    name = "dmoz"
    allowed_domains = ["dmoz.org"]
    start_urls = [

    def parse(self, response):
        for sel in response.xpath('//ul/li'):
            title = sel.xpath('a/text()').extract()
            link = sel.xpath('a/@href').extract()
            desc = sel.xpath('text()').extract()
            print title, link, desc

However if I use the following code:

import scrapy

class DmozSpider(scrapy.Spider):
    name = "dmoz"
    allowed_domains = ["dmoz.org"]
    start_urls = [

    def parse(self, response):
        for href in response.css("ul.directory.dir-col > li > a::attr('href')"):
            link = href.extract()

the link is a string rather than a list.

Is this a bug or intended?

.xpath().extract() and .css().extract() return a list because .xpath() and .css() return SelectorList objects.

See https://parsel.readthedocs.org/en/v1.0.1/usage.html#parsel.selector.SelectorList.extract

(SelectorList) .extract():

Call the .extract() method for each element is this list and return their results flattened, as a list of unicode strings.

.extract_first() is what you are looking for (which is poorly documented)

Taken from http://doc.scrapy.org/en/latest/topics/selectors.html :

If you want to extract only first matched element, you can call the selector .extract_first()

>>> response.xpath('//div[@id="images"]/a/text()').extract_first()
u'Name: My image 1 '

In your other example:

def parse(self, response):
    for href in response.css("ul.directory.dir-col > li > a::attr('href')"):
        link = href.extract()

each href in the loop will be a Selector object. Calling .extract() on it will get you a single Unicode string back:

$ scrapy shell "http://www.dmoz.org/Computers/Programming/Languages/Python/"
2016-02-26 12:11:36 [scrapy] INFO: Scrapy 1.0.5 started (bot: scrapybot)
In [1]: response.css("ul.directory.dir-col > li > a::attr('href')")
[<Selector xpath=u"descendant-or-self::ul[@class and contains(concat(' ', normalize-space(@class), ' '), ' directory ') and (@class and contains(concat(' ', normalize-space(@class), ' '), ' dir-col '))]/li/a/@href" data=u'/Computers/Programming/Languages/Python/'>,
 <Selector xpath=u"descendant-or-self::ul[@class and contains(concat(' ', normalize-space(@class), ' '), ' directory ') and (@class and contains(concat(' ', normalize-space(@class), ' '), ' dir-col '))]/li/a/@href" data=u'/Computers/Programming/Languages/Python/'>,
 <Selector xpath=u"descendant-or-self::ul[@class and contains(concat(' ', normalize-space(@class), ' '), ' directory ') and (@class and contains(concat(' ', normalize-space(@class), ' '), ' dir-col '))]/li/a/@href" data=u'/Computers/Programming/Languages/Python/'>]

so .css() on the response returns a SelectorList:

In [2]: type(response.css("ul.directory.dir-col > li > a::attr('href')"))
Out[2]: scrapy.selector.unified.SelectorList

Looping on that object gives you Selector instances:

In [5]: for href in response.css("ul.directory.dir-col > li > a::attr('href')"):
   ...:     print href
<Selector xpath=u"descendant-or-self::ul[@class and contains(concat(' ', normalize-space(@class), ' '), ' directory ') and (@class and contains(concat(' ', normalize-space(@class), ' '), ' dir-col '))]/li/a/@href" data=u'/Computers/Programming/Languages/Python/'>
<Selector xpath=u"descendant-or-self::ul[@class and contains(concat(' ', normalize-space(@class), ' '), ' directory ') and (@class and contains(concat(' ', normalize-space(@class), ' '), ' dir-col '))]/li/a/@href" data=u'/Computers/Programming/Languages/Python/'>
<Selector xpath=u"descendant-or-self::ul[@class and contains(concat(' ', normalize-space(@class), ' '), ' directory ') and (@class and contains(concat(' ', normalize-space(@class), ' '), ' dir-col '))]/li/a/@href" data=u'/Computers/Programming/Languages/Python/'>

And calling .extract() gives you a single Unicode string:

In [6]: for href in response.css("ul.directory.dir-col > li > a::attr('href')"):
    print type(href.extract())
<type 'unicode'>
<type 'unicode'>
<type 'unicode'>
<type 'unicode'>
<type 'unicode'>
<type 'unicode'>
<type 'unicode'>
<type 'unicode'>
<type 'unicode'>
<type 'unicode'>
<type 'unicode'>
<type 'unicode'>
<type 'unicode'>

Note: .extract() on Selector is wrongly documented as returning a list of strings. I'll open an issue on parsel (which is the same as Scrapy selectors, and used under the hood in scrapy 1.1+)

