Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy pass extra data from csv file into parse

Tags:

python

csv

scrapy

My scrapy spider looks through a csv file and runs start_urls with the address in the csv file like so:

 from csv import DictReader
   with open('addresses.csv') as rows:
     start_urls=['http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+') for row in DictReader(rows)]

But the .csv file also contains emails and other information. How can I pass this extra information into the parse to add it to the new file?

import scrapy
from csv import DictReader

with open('addresses.csv') as rows:
  names=[row["Name"].replace(',','') for row in DictReader(rows)]
  emails=[row["Email"].replace(',','') for row in DictReader(rows)]
  start_urls=['http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+') for row in DictReader(rows)]

 def parse(self,response):
   yield{
     'name': FROM CSV,
     'email': FROM CSV,
     'address' FROM SCRAPING: 
     'city' FROM SCRAPING: 
    }
like image 801
Maciek Semik Avatar asked Mar 01 '26 10:03

Maciek Semik


1 Answers

import scrapy
from csv import DictReader

class MySpider(scrapy.Spider):

    def start_requests(self):

        with open('addresses.csv') as rows:

            for row in DictReader(rows):

                name=row["Name"].replace(',','')
                email=row["Email"].replace(',','')

                link = 'http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+')

                yield Request(url = link, 
                        callback = self.parse, 
                        method = "GET", 
                        meta={'name':name, 'email':email}
                    )


    def parse(self,response):
        yield{
         'name': resposne.meta['name'],
         'email': respose.meta['email'],
         'address' FROM SCRAPING: 
         'city' FROM SCRAPING: 
        }
  • Open your CSV file.
  • Iterate over it inside start_requests method.
  • Pass parameters to callback function, use meta variable, you can pass a Python Dictionary in meta.

Note: Remember that start_requests is not my custom defined method, its Python Scrapy's method. See https://doc.scrapy.org/en/latest/topics/spiders.html#scrapy.spiders.Spider.start_requests

like image 117
Umair Ayub Avatar answered Mar 02 '26 23:03

Umair Ayub