Scrapy

Question

When looking online for Scrapy to solve a captcha, I see no good example to even start with.

I've created a very basic captcha page. http://145.100.108.148/login3/

Is there someone with a working example to solve this, or at least configured Scrapy in a decent fashion trying to solve it.

Tomáš Linhart · Accepted Answer

Solving the captcha itself is easy using Pillow and Python Tesseract. The hard part was to realize how to handle cookies (PHPSESSID). Here's complete working example for your case (using Python 2):

# -*- coding: utf-8 -*-                                                         
import io                                                                       
import urllib2                                                                  

from PIL import Image                                                           
import pytesseract                                                              
import scrapy                                                                   


class CaptchaSpider(scrapy.Spider):                                             
    name = 'captcha'                                                            

    def start_requests(self):                                                   
        yield scrapy.Request('http://145.100.108.148/login3/',                  
                             cookies={'PHPSESSID': 'xyz'})                      

    def parse(self, response):                                                  
        img_url = response.urljoin(response.xpath('//img/@src').extract_first())

        url_opener = urllib2.build_opener()                                     
        url_opener.addheaders.append(('Cookie', 'PHPSESSID=xyz'))               
        img_bytes = url_opener.open(img_url).read()                             
        img = Image.open(io.BytesIO(img_bytes))                                 

        captcha = pytesseract.image_to_string(img)                              
        print 'Captcha solved:', captcha                                        

        return scrapy.FormRequest.from_response(                                
            response, formdata={'captcha': captcha},                            
            callback=self.after_captcha)                                        

    def after_captcha(self, response):                                          
        print 'Result:', response.body

Scrapy - simple captcha solving example

Tags:

python

captcha

Kevin C

1 Answers

Tomáš Linhart

Recent Activity

Donate For Us