Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Processing images without downloading using Scrapy Spiders

I'm trying to use a Scrapy Spider to solve a problem (a programming question from HackThisSite):

(1) I have to log in a website, giving a username and a password (already done)

(2) After that, I have to access an image with a given URL (the image is only accessible to logged in users)

(3) Then, without saving the image in the hard disk, I have to read its information in a kind of buffer

(4) And the result of the function will fill a form and send the data to the website server (I already know how to do this step)

So, I can resume to question to: would it be possible (using a spider) to read an image accessible only by logged-in users and process it in the spider code?

I tried to research different methods, using item pipelines is not a good approach (I don't want to download the file).

The code that I already have is:

class ProgrammingQuestion2(Spider):

    name = 'p2'
    start_urls = ['https://www.hackthissite.org/']

    def parse(self, response):

        formdata_hts = {'username': <MY_USER_NAME>,
                'password': <MY_PASSWORD>,
                'btn_submit': 'Login'}

        return FormRequest.from_response(response,
                formdata=formdata_hts, callback=self.redirect_to_page)

    def redirect_to_page(self, response):

        yield Request(url='https://www.hackthissite.org/missions/prog/2/',
                callback=self.solve_question_2)

    def solve_question_2(self, response):

        open_in_browser(response)
        img_url = 'https://www.hackthissite.org/missions/prog/2/PNG'
        # What can I do here?

I expect to solve this problem using Scrapy functions, otherwise it would be necessary to log in the website (sending the form data) again.

like image 963
G.Dantas Avatar asked Dec 30 '25 18:12

G.Dantas


1 Answers

You can make a scrapy request to crawl the image and then callback to some other endpoint:

def parse_page(self, response):
    img_url = 'https://www.hackthissite.org/missions/prog/2/PNG'
    yield Request(img_url, callback=self.parse_image)

def parse_image(self, response):
    image_bytes = response.body
    form_data = form_from_image(image_bytes)
    # make form request
like image 173
Granitosaurus Avatar answered Jan 01 '26 22:01

Granitosaurus



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!