Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error 302 Downloading File in Scrapy

Tags:

python

scrapy

Why am I receiving this error?

[scrapy] WARNING: File (code: 302): Error downloading file from <GET <url> referred in <None>

The URL seems to download without any problems in my browser and a 302 is simply a redirect. Why wouldn't scrapy simply follow the redirect to download the file?

process = CrawlerProcess({
    'FILES_STORE': 'C:\\Users\\User\\Downloads\\Scrapy',
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
    'ITEM_PIPELINES': {'scrapy.pipelines.files.FilesPipeline': 1},
})

process.crawl(MySpider)
process.start()  # the script will block here until the crawling is finished
like image 882
xaav Avatar asked Dec 13 '25 08:12

xaav


2 Answers

My solution is use requests to send a http requests first,base on the status_code to choose which url to download, now you can put the url in file_urls or your custom name.

import requests

def check_redirect(url):
    response = requests.head(url)
    if response.status_code == 302:
        url = response.headers["Location"]
    return url

or may be you can use custom filespipeline

class MyFilesPipeline(FilesPipeline):

def handle_redirect(self, file_url):
    response = requests.head(file_url)
    if response.status_code == 302:
        file_url = response.headers["Location"]
    return file_url

def get_media_requests(self, item, info):
    redirect_url = self.handle_redirect(item["file_urls"][0])
    yield scrapy.Request(redirect_url)

def item_completed(self, results, item, info):
    file_paths = [x['path'] for ok, x in results if ok]
    if not file_paths:
        raise DropItem("Item contains no images")
    item['file_urls'] = file_paths
    return item

I used other solution here Scrapy i/o block when downloading files

like image 76
Windsooon Avatar answered Dec 14 '25 22:12

Windsooon


If redirection is the problem you should add following, in your settings.py :

MEDIA_ALLOW_REDIRECTS = True

Source : Allowing redirections in Scrapy

like image 24
appsdownload Avatar answered Dec 14 '25 20:12

appsdownload



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!