check tor connection is established before running scrapy

Question

I'd like to check tor before I start crawling using python scrapy. I am using polipo/tor/scrapy on linux.

with this settup scrapy correctly using tor on its crawls. The way I check if the scrapy using tor correctly is to crawl this page in myspider.

class mySpider(scrapy.Spider): 
    def start_requests(self):
         yield Request('https://check.torproject.org/', self.parse)

    def parse(self, response):
         logging.info("Check tor page:" + str(response.css('.content h1::text')))

However I think there might be a better/clean way of doing it. I know I can check tor service status or check ip address but I want to actually check whether tor connection is correctly established.

drew010 · Accepted Answer

A somewhat definitive way to do this is to connect to Tor's control port and issue GETINFO status/circuit-established.

If Tor has an active circuit built, it will return:

250-status/circuit-established=1
250 OK

If Tor hasn't been used for a while, this could be 0. You can also call GETINFO dormant which would yield 250-dormant=1. Most likely when you then try to use Tor, it will build a circuit and dormant will become 0 and circuit-established will be 1 barring any major network issues.

In either case, dormant=0 or circuit-established=1 should be enough to tell you can use Tor.

It's a simple protocol so you can just open a socket to the control port, authenticate, and issue commands, or use Controller from Stem.

See the control spec for more info.

check tor connection is established before running scrapy

Tags:

python

tor

scrapy

PHA

1 Answers

drew010

Recent Activity

Donate For Us

check tor connection is established before running scrapy

Tags:

python

tor

scrapy

PHA

1 Answers

drew010

Related questions

Recent Activity

Donate For Us