scrapy is not downloading files properly. I have URLs of my items, so I figured I can use wget to download the files.
How can i use wget inside the scrapy process_item
function? Alternatively, is there another way of download files?
class MyImagesPipeline(ImagesPipeline):
#Name download version
def image_key(self, url):
image_guid = url.split('/')[-1]
return 'full/%s' % (image_guid)
def get_media_requests(self, item, info):
if item['image_urls']:
for image_url in item['image_urls']:
# wget -nH image_ul -P images/
yield Request(image_url)
This code will execute wget, you can replace your comment with the following lines
import subprocess
...
subprocess.call(['wget','-nH', image_url, '-P images/'])
You can read about subprocess.call here: http://docs.python.org/2/library/subprocess.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With