I'd like to check tor before I start crawling using python scrapy. I am using polipo/tor/scrapy on linux.
with this settup scrapy correctly using tor on its crawls. The way I check if the scrapy using tor correctly is to crawl this page in myspider.
class mySpider(scrapy.Spider):
def start_requests(self):
yield Request('https://check.torproject.org/', self.parse)
def parse(self, response):
logging.info("Check tor page:" + str(response.css('.content h1::text')))
However I think there might be a better/clean way of doing it. I know I can check tor service status or check ip address but I want to actually check whether tor connection is correctly established.
A somewhat definitive way to do this is to connect to Tor's control port and issue GETINFO status/circuit-established
.
If Tor has an active circuit built, it will return:
250-status/circuit-established=1
250 OK
If Tor hasn't been used for a while, this could be 0. You can also call GETINFO dormant
which would yield 250-dormant=1
. Most likely when you then try to use Tor, it will build a circuit and dormant will become 0 and circuit-established will be 1 barring any major network issues.
In either case, dormant=0 or circuit-established=1 should be enough to tell you can use Tor.
It's a simple protocol so you can just open a socket to the control port, authenticate, and issue commands, or use Controller from Stem.
See the control spec for more info.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With