Python,Scrapy, Pipeline: function "process_item" not getting called

Tags:

I have a very simple code, shown below. Scraping is okay, I can see all print statements generating correct data. In Pipeline,initialization is working fine. However, process_item function is not getting called, as print statement at the start of the function is never executed.

Spider: comosham.py

import scrapy
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from activityadvisor.items import ComoShamLocation
from activityadvisor.items import ComoShamActivity
from activityadvisor.items import ComoShamRates
import re


class ComoSham(Spider):
    name = "comosham"
    allowed_domains = ["www.comoshambhala.com"]
    start_urls = [
        "http://www.comoshambhala.com/singapore/classes/schedules",
        "http://www.comoshambhala.com/singapore/about/location-contact",
        "http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes",
        "http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes/rates-private-classes"
    ]

    def parse(self, response):  
        category = (response.url)[39:44]
        print 'in parse'
        if category == 'class':
            pass
            """self.gen_req_class(response)"""
        elif category == 'about':
            print 'about to call parse_location'
            self.parse_location(response)
        elif category == 'rates':
            pass
            """self.parse_rates(response)"""
        else:
            print 'Cant find appropriate category! check check check!! Am raising Level 5 ALARM - You are a MORON :D'


    def parse_location(self, response):
        print 'in parse_location'       
        item = ComoShamLocation()
        item['category'] = 'location'
        loc = Selector(response).xpath('((//div[@id = "node-2266"]/div/div/div)[1]/div/div/p//text())').extract()
        item['address'] = loc[2]+loc[3]+loc[4]+(loc[5])[1:11]
        item['pin'] = (loc[5])[11:18]
        item['phone'] = (loc[9])[6:20]
        item['fax'] = (loc[10])[6:20]
        item['email'] = loc[12]
        print item['address'],item['pin'],item['phone'],item['fax'],item['email']
        return item

Items file:

import scrapy
from scrapy.item import Item, Field

class ComoShamLocation(Item):
    address = Field()
    pin = Field()
    phone = Field()
    fax = Field()
    email = Field()
    category = Field()

Pipeline file:

class ComoShamPipeline(object):
    def __init__(self):
        self.locationdump = csv.writer(open('./scraped data/ComoSham/ComoshamLocation.csv','wb'))
        self.locationdump.writerow(['Address','Pin','Phone','Fax','Email'])


    def process_item(self,item,spider):
        print 'processing item now'
        if item['category'] == 'location':
            print item['address'],item['pin'],item['phone'],item['fax'],item['email']
            self.locationdump.writerow([item['address'],item['pin'],item['phone'],item['fax'],item['email']])
        else:
            pass

217

asked Jul 10 '15 02:07

Tuhina Singh

2 Answers

Your problem is that you are never actually yielding the item. parse_location returns an item to parse, but parse never yields that item.

The solution would be to replace:

self.parse_location(response)

with

yield self.parse_location(response)

More specifically, process_item never gets called if no items are yielded.

100

answered Nov 16 '22 04:11

rocktheartsm4l

Use ITEM_PIPELINES in settings.py:

ITEM_PIPELINES = ['project_name.pipelines.pipeline_class']

answered Nov 16 '22 04:11

Ganesh

Related questions
                            
                                For loop in unittest
                            
                                How to install libpython2.7.so
                            
                                How to embed python in an Objective-C OS X application for plugins?
                            
                                plotting the projection of 3D plot in three planes using contours
                            
                                Average line for bar chart in matplotlib
                            
                                Sorted bar charts with pandas/matplotlib or seaborn
                            
                                Use first row as column names? Pandas read_html
                            
                                Python multiprocessing - tracking the process of pool.map operation
                            
                                Delete pdf files in folders and subfolders with python?
                            
                                Cython/Python/C++ - Inheritance: Passing Derived Class as Argument to Function expecting base class
                            
                                python dict implementation details [duplicate]
                            
                                Negation handling in NLP
                            
                                k-means with selected initial centers
                            
                                Error in Tumblelog Application development using Flask and MongoEngine
                            
                                Tornado framework. TypeError: 'Future' object is not callable
                            
                                sklearn matrix factorization example
                            
                                google-app-engine 1.9.19 deploy failure
                            
                                How do I convert a .tsv to .csv?
                            
                                Merging and subtracting DataFrame columns in pandas?
                            
                                How do I call an Excel macro from Python using xlwings?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python,Scrapy, Pipeline: function "process_item" not getting called

Tags:

python

scrapy

pipeline

Tuhina Singh

People also ask

2 Answers

rocktheartsm4l

Ganesh

Recent Activity

Donate For Us