Scrapy BaseSpider: How does it work?

Tags:

This is the BaseSpider example from the Scrapy tutorial:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector

from dmoz.items import DmozItem

class DmozSpider(BaseSpider):
   domain_name = "dmoz.org"
   start_urls = [
       "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
       "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
   ]

   def parse(self, response):
       hxs = HtmlXPathSelector(response)
       sites = hxs.select('//ul[2]/li')
       items = []
       for site in sites:
           item = DmozItem()
           item['title'] = site.select('a/text()').extract()
           item['link'] = site.select('a/@href').extract()
           item['desc'] = site.select('text()').extract()
           items.append(item)
       return items

SPIDER = DmozSpider()

I copied it with changes for my project:

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from scrapy.item import Item
from firm.items import FirmItem

class Spider1(CrawlSpider):
    domain_name = 'wc2'
    start_urls = ['http://www.whitecase.com/Attorneys/List.aspx?LastName=A']

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        sites = hxs.select('//td[@class="altRow"][1]/a/@href').re('/.a\w+')
        items = []
        for site in sites:
            item = FirmItem
            item['school'] = hxs.select('//td[@class="mainColumnTDa"]').re('(JD)(.*?)(\d+)')
            items.append(item)
        return items

SPIDER = Spider1()

and I get the error

[wc2] ERROR: Spider exception caught while processing   
<http://www.whitecase.com/Attorneys/List.aspx?LastName=A> (referer: <None>): 
[Failure instance: Traceback: <type 'exceptions.TypeError'>: 
'ItemMeta' object does not support item assignment

I would greatly appreciate it if experts here take a look at the code and give me a clue about where I am going wrong.

Thank you

287

asked Nov 27 '09 00:11

Zeynel

1 Answers

Probably you meant item = FirmItem() instead of item = FirmItem?

answered Sep 25 '22 01:09

Denis Otkidach

Related questions
                            
                                Comparing dictionaries in Python
                            
                                Writing to the serial port in Vista from Python
                            
                                What are the steps to convert from using libglade to GtkBuilder? (Python)
                            
                                Django caching - can it be done pre-emptively?
                            
                                on my local Windows machine, how do i write a script to download a comic strip every day and email it to myself?
                            
                                Caching values in Python list comprehensions
                            
                                Browser automation: Python + Firefox using PyXPCOM
                            
                                How to parse for tags with '+' in python
                            
                                How can I parse the output of /proc/net/dev into key:value pairs per interface using Python?
                            
                                Programmatic control of python optimization?
                            
                                GTK: create a colored regular button
                            
                                What is the recommended Python module for fast Fourier transforms (FFT)?
                            
                                How to define properties in __init__
                            
                                Django: form values not updating when model updates
                            
                                Given a Python class, how can I inspect and find the place in my code where it is defined?
                            
                                Python: Int not iterable error
                            
                                text-mine PDF files with Python?
                            
                                PHPs call_user_func_array in Python
                            
                                Customize HTML Output of Django ModelForm
                            
                                Calling Method from Different Python File

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scrapy BaseSpider: How does it work?

Tags:

python

scrapy

web-crawler

Zeynel

People also ask

1 Answers

Denis Otkidach

Recent Activity

Donate For Us