Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy overwrite json files instead of appending the file

Tags:

python

scrapy

Is there a way to overwrite the said file instead of appending it?

Example)

scrapy crawl myspider -o "/path/to/json/my.json" -t json    
scrapy crawl myspider -o "/path/to/json/my.json" -t json

Will append the my.json file instead of overwrite it.

like image 568
hooliooo Avatar asked Oct 15 '15 05:10

hooliooo


2 Answers

There is a flag which allows overwriting the output file, you can do so by passing the file reference via -O option instead of -o, so you can use this instead:

scrapy crawl myspider -O /path/to/json/my.json

More information:

$ scrapy crawl --help
Usage
=====
  scrapy crawl [options] <spider>

Run a spider

Options
=======
--help, -h              show this help message and exit
-a NAME=VALUE           set spider argument (may be repeated)
--output=FILE, -o FILE  append scraped items to the end of FILE (use - for
                        stdout)
--overwrite-output=FILE, -O FILE
                        dump scraped items into FILE, overwriting any existing
                        file
--output-format=FORMAT, -t FORMAT
                        format to use for dumping items

Global Options
--------------
--logfile=FILE          log file. if omitted stderr will be used
--loglevel=LEVEL, -L LEVEL
                        log level (default: DEBUG)
--nolog                 disable logging completely
--profile=FILE          write python cProfile stats to FILE
--pidfile=FILE          write process ID to FILE
--set=NAME=VALUE, -s NAME=VALUE
                        set/override setting (may be repeated)
--pdb                   enable pdb on failure
like image 96
Ismail Avatar answered Sep 18 '22 02:09

Ismail


scrapy crawl myspider -t json --nolog -o - > "/path/to/json/my.json"
like image 40
eLRuLL Avatar answered Sep 19 '22 02:09

eLRuLL