Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Modifiying CSV export in scrapy

Tags:

python

csv

scrapy

I seem to be missing something very simple. All i want to do is use ; as a delimiter in the CSV exporter instead of ,.

I know the CSV exporter passes kwargs to csv writer, but i cant seem to figure out how to pass this the delimiter.

I am calling my spider like so:

scrapy crawl spidername --set FEED_URI=output.csv --set FEED_FORMAT=csv 
like image 868
zsquare Avatar asked Mar 09 '11 11:03

zsquare


2 Answers

In contrib/feedexport.py,

class FeedExporter(object):

    ...

    def open_spider(self, spider):
        file = TemporaryFile(prefix='feed-')
        exp = self._get_exporter(file)  # <-- this is where the exporter is instantiated
        exp.start_exporting()
        self.slots[spider] = SpiderSlot(file, exp)

    def _get_exporter(self, *a, **kw):
        return self.exporters[self.format](*a, **kw)  # <-- not passed in :(

You will need to make your own, here's an example:

from scrapy.conf import settings
from scrapy.contrib.exporter import CsvItemExporter


class CsvOptionRespectingItemExporter(CsvItemExporter):

    def __init__(self, *args, **kwargs):
        delimiter = settings.get('CSV_DELIMITER', ',')
        kwargs['delimiter'] = delimiter
        super(CsvOptionRespectingItemExporter, self).__init__(*args, **kwargs)

In the settings.py file of your crawler directory, add this:

FEED_EXPORTERS = {
    'csv': 'importable.path.to.CsvOptionRespectingItemExporter',
}

Now, you can execute your spider as follows:

scrapy crawl spidername --set FEED_URI=output.csv --set FEED_FORMAT=csv --set CSV_DELIMITER=';'

HTH.

like image 99
Mahmoud Abdelkader Avatar answered Oct 06 '22 00:10

Mahmoud Abdelkader


scraper/exporters.py

from scrapy.exporters import CsvItemExporter
from scraper.settings import CSV_SEP


class CsvCustomSeperator(CsvItemExporter):
    def __init__(self, *args, **kwargs):
        kwargs['encoding'] = 'utf-8'
        kwargs['delimiter'] = CSV_SEP
        super(CsvCustomSeperator, self).__init__(*args, **kwargs)

scraper/settings.py

CSV_SEP = '|'
FEED_EXPORTERS = {
    'csv': 'scraper.exporters.CsvCustomSeperator'
}

In terminal

$ scrapy crawl spider -o file.csv -s CSV_SEP=<delimiter>
like image 29
0x01h Avatar answered Oct 06 '22 00:10

0x01h