here is the spider:
import scrapy
from danmurphys.items import DanmurphysItem
class MySpider(scrapy.Spider):
name = 'danmurphys'
allowed_domains = ['danmurphys.com.au']
start_urls = ['https://www.danmurphys.com.au/dm/navigation/navigation_results_gallery.jsp?params=fh_location%3D%2F%2Fcatalog01%2Fen_AU%2Fcategories%3C%7Bcatalog01_2534374302084767_2534374302027742%7D%26fh_view_size%3D120%26fh_sort%3D-sales_value_30_days%26fh_modification%3D&resetnav=false&storeExclusivePage=false']
def parse(self, response):
urls = response.xpath('//h2/a/@href').extract()
for url in urls:
request = scrapy.Request(url , callback=self.parse_page)
yield request
def parse_page(self , response):
item = DanmurphysItem()
item['brand'] = response.xpath('//span[@itemprop="brand"]/text()').extract_first().strip()
item['name'] = response.xpath('//span[@itemprop="name"]/text()').extract_first().strip()
item['url'] = response.url
return item
and here is the items :
import scrapy
class DanmurphysItem(scrapy.Item):
brand = scrapy.Field()
name = scrapy.Field()
url = scrapy.Field()
when I run the spider with this command :
scrapy crawl danmurphys -o output.csv
the output is like this :
To fix this in Scrapy 1.3, you can patch it by adding newline=''
as parameter to io.TextIOWrapper
in the __init__
method of the CsvItemExporter
class in scrapy.exporters
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With