Scrapy: Images Pipeline, download images

Tags:

Following: scrapy's tutorial i made a simple image crawler (scrapes images of Bugattis). Which is illustrated below in EXAMPLE.

However, following the guide has left me with a non functioning crawler! It finds all of the urls but it does not download the images.

I found a duck tape solution: replace ITEM_PIPELINES and IMAGES_STORE such that;

ITEM_PIPELINES['scrapy.pipeline.images.FilesPipeline'] = 1 and

IMAGES_STORE -> FILES_STORE

But I do not know why this works? I would like to use the ImagePipeline as documented by scrapy.

EXAMPLE

settings.py

BOT_NAME = 'imagespider'
SPIDER_MODULES = ['imagespider.spiders']
NEWSPIDER_MODULE = 'imagespider.spiders'
ITEM_PIPELINES = {
    'scrapy.pipelines.images.ImagesPipeline': 1,
}
IMAGES_STORE = "/home/user/Desktop/imagespider/output"

items.py

import scrapy

class ImageItem(scrapy.Item):
    file_urls = scrapy.Field()
    files = scrapy.Field()

imagespider.py

from imagespider.items import ImageItem
import scrapy


class ImageSpider(scrapy.Spider):
    name = "imagespider"

    start_urls = (
        "https://www.find.com/search=bugatti+veyron",
    )

    def parse(self, response):
        for elem in response.xpath("//img"):
            img_url = elem.xpath("@src").extract_first()
            yield ImageItem(file_urls=[img_url])

806

asked Jul 26 '16 11:07

Alexander R Johansen

1 Answers

The item your spider returns must contains fields "file_urls" for files and/or "image_urls" for images. In your code you specify settings for Image pipeline but your return urls in "file_urls".

Simply change this line:

yield ImageItem(file_urls=[img_url])
# to
yield {'image_urls': [img_url]}

* scrapy can return dictionary objects instead of items, which saves time when you only have one or two fields.

122

answered Sep 28 '22 20:09

Granitosaurus

Related questions
                            
                                Finding highest values in each row in a data frame for python
                            
                                Pairwise haversine distance calculation
                            
                                Run app from Flask-Migrate manager
                            
                                NumPy calculate square of norm 2 of vector
                            
                                Python Click Library Rename Argument
                            
                                Load custom image from file system in scikit-image
                            
                                How to stop scrapy spider after certain number of requests?
                            
                                Iterating over each element in pandas DataFrame
                            
                                Python 3 urllib with self-signed certificates
                            
                                Assigning string with boolean expression [duplicate]
                            
                                How do I open an MP4 video file with python?
                            
                                datetime object from string, seconds in float
                            
                                Python's tabulate number of decimal [closed]
                            
                                "Clicking" button with requests
                            
                                Python Pandas Only Compare Identically Labeled DataFrame Objects
                            
                                Difference between os.system("pwd") and os.getcwd()
                            
                                Object-like attribute access for nested dictionary
                            
                                Pandas: Replacing Non-numeric cells with 0
                            
                                Customize ONBUILD environment in a dockerfile
                            
                                How to hide a field in editable mode in odoo (version 8 to 15)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scrapy: Images Pipeline, download images

Tags:

python

scrapy

scraper

scrapy-spider

Alexander R Johansen

People also ask

1 Answers

Granitosaurus

Recent Activity

Donate For Us