I've written a script in python scrapy to download some images from a website. When i run my script, I can see the link of images (all of them are in .jpg
format) in the console. However, when I open the folder in which the images are supposed to be saved when the downloading is done, I get nothing in there. Where I'm making mistakes?
This is my spider (I'm running from sublime text editor):
import scrapy
from scrapy.crawler import CrawlerProcess
class YifyTorrentSpider(scrapy.Spider):
name = "yifytorrent"
start_urls= ['https://www.yify-torrent.org/search/1080p/']
def parse(self, response):
for q in response.css("article.img-item .poster-thumb"):
image = response.urljoin(q.css("::attr(src)").extract_first())
yield {'':image}
c = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0',
})
c.crawl(YifyTorrentSpider)
c.start()
This is what I've defined in settings.py
for the images to be saved:
ITEM_PIPELINES = {
'scrapy.pipelines.images.ImagesPipeline': 1,
}
IMAGES_STORE = "/Desktop/torrentspider/torrentspider/spiders/Images"
To make things clearer:
Images
which I've placed in the spider
folder under the project torrentspider
.Images
folder is C:\Users\WCS\Desktop\torrentspider\torrentspider\spiders
.It's not about running the script successfully with the help of items.py
file. So, any solution to make the download happen with the use of items.py
file is not what I'm looking for.
The item you are yielding does not follow the documentation of Scrapy. As detailed in their media pipeline documentation the item should have a field called image_urls
. You should change your parse method to something similar to this.
def parse(self, response):
images = []
for q in response.css("article.img-item .poster-thumb"):
image = response.urljoin(q.css("::attr(src)").extract_first())
images.append(image)
yield {'image_urls': images}
I just tested this and it works. Additionally, as commented by Pruthvi Kumar, the IMAGES_STORE should just be like
IMAGES_STORE = 'Images'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With