Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write to a csv file scrapy

Tags:

python

csv

scrapy

I want to write to csv file in scrapy

 for rss in rsslinks:
  item = AppleItem()
  item['reference_link'] = response.url
  base_url = get_base_url(response)
  item['rss_link'] = urljoin_rfc(base_url,rss)
  #item['rss_link'] = rss
  items.append(item)
  #items.append("\n")
 f = open(filename,'a+')    #filename is apple.com.csv
 for item in items:
    f.write("%s\n" % item)

My output is this:

{'reference_link': 'http://www.apple.com/'
 'rss_link': 'http://www.apple.com/rss '
{'reference_link': 'http://www.apple.com/rss/'
 'rss_link':   'http://ax.itunes.apple.com/WebObjects/MZStore.woa/wpa/MRSS/newreleases/limit=10/rss.xml'}
{'reference_link': 'http://www.apple.com/rss/'
 'rss_link':  'http://ax.itunes.apple.com/WebObjects/MZStore.woa/wpa/MRSS/newreleases/limit=25/rss.xml'}

What I want is this format:

reference_link               rss_link  
http://www.apple.com/     http://www.apple.com/rss/
like image 359
blackmamba Avatar asked Dec 21 '13 12:12

blackmamba


People also ask

How do you save a scrapy data in a CSV file?

Saving CSV Files Via The Command Line​ The first and simplest way to create a CSV file of the data you have scraped, is to simply define a output path when starting your spider in the command line. To save to a CSV file add the flag -o to the scrapy crawl command along with the file path you want to save the file to.

What is callback function scrapy?

In the callback function, you parse the response (web page) and return either Item objects, Request objects, or an iterable of both. Those Requests will also contain a callback (maybe the same) and will then be downloaded by Scrapy and then their response handled by the specified callback.

How do you run a scrapy in a script?

The key to running scrapy in a python script is the CrawlerProcess class. This is a class of the Crawler module. It provides the engine to run scrapy within a python script. Within the CrawlerProcess class code, python's twisted framework is imported.


3 Answers

simply crawl with -o csv, like:

scrapy crawl <spider name> -o file.csv -t csv
like image 98
Guy Gavriely Avatar answered Oct 18 '22 03:10

Guy Gavriely


This is what worked for me using Python3:

scrapy runspider spidername.py -o file.csv -t csv
like image 27
jwalman Avatar answered Oct 18 '22 04:10

jwalman


Best approach to solve this problem is to use python in-build csv package.

import csv

file_name = open('Output_file.csv', 'w') #Output_file.csv is name of output file

fieldnames = ['reference_link', 'rss_link'] #adding header to file
writer = csv.DictWriter(file_name, fieldnames=fieldnames)
writer.writeheader()
for rss in rsslinks:
    base_url = get_base_url(response)
    writer.writerow({'reference_link': response.url, 'rss_link': urljoin_rfc(base_url, rss)}) #writing data into file.
like image 23
Anurag Misra Avatar answered Oct 18 '22 03:10

Anurag Misra