Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

check tor connection is established before running scrapy

Tags:

python

tor

scrapy

I'd like to check tor before I start crawling using python scrapy. I am using polipo/tor/scrapy on linux.

with this settup scrapy correctly using tor on its crawls. The way I check if the scrapy using tor correctly is to crawl this page in myspider.

class mySpider(scrapy.Spider): 
    def start_requests(self):
         yield Request('https://check.torproject.org/', self.parse)

    def parse(self, response):
         logging.info("Check tor page:" + str(response.css('.content h1::text')))

However I think there might be a better/clean way of doing it. I know I can check tor service status or check ip address but I want to actually check whether tor connection is correctly established.

like image 209
PHA Avatar asked Sep 06 '25 03:09

PHA


1 Answers

A somewhat definitive way to do this is to connect to Tor's control port and issue GETINFO status/circuit-established.

If Tor has an active circuit built, it will return:

250-status/circuit-established=1
250 OK

If Tor hasn't been used for a while, this could be 0. You can also call GETINFO dormant which would yield 250-dormant=1. Most likely when you then try to use Tor, it will build a circuit and dormant will become 0 and circuit-established will be 1 barring any major network issues.

In either case, dormant=0 or circuit-established=1 should be enough to tell you can use Tor.

It's a simple protocol so you can just open a socket to the control port, authenticate, and issue commands, or use Controller from Stem.

See the control spec for more info.

like image 169
drew010 Avatar answered Sep 07 '25 21:09

drew010