Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to detect inbound HTTP requests sent anonymously via Tor?

I'm developing a website and am sensitive to people screen scraping my data. I'm not worried about scraping one or two pages -- I'm more concerned about someone scraping thousands of pages as the aggregate of that data is much more valuable than a small percentage would be.

I can imagine strategies to block users based on heavy traffic from a single IP address, but the Tor network sets up many circuits that essentially mean a single user's traffic appears to come from different IP addresses over time.

I know that it is possible to detect Tor traffic as when I installed Vidalia with its Firefox extension, google.com presented me with a captcha.

So, how can I detect such requests?

(My website's in ASP.NET MVC 2, but I think any approach used here would be language independent)

like image 965
Drew Noakes Avatar asked Sep 10 '10 19:09

Drew Noakes


People also ask

Can Tor traffic be detected?

Traffic from the Tor network can be detected by configuring a firewall or gateway to audit and log connections from Tor exit nodes. This can be achieved by using an up-to-date list of Tor exit nodes in a block list that has been configured in audit mode instead of enforcement mode.

How do Tor networks provide anonymity?

Tor provides this anonymity by routing communications through several intermediary proxies, other nodes operating in the network, before the traffic reaches an endpoint and is delivered to its final destination.

Can network administrators block Tor?

Tor is often blocked by administrators of certain networks. One way around this is to use bridges which shouldn't be detectable as Tor nodes. If the blockage is more sophisticated and uses deep packet inspection, you may need to use an additional tool, such as Pluggable Transports (see below).

How do I block traffic on Tor?

The most common way to block Tor traffic would be to locate an updating list of Tor exit nodes and configure a firewall to block these nodes. A company policy to prevent Tor use may also go a long way to cease its use.


3 Answers

I'm developing a website and am sensitive to people screen scraping my data

Forget about it. If it's on the web and someone wants it, it will be impossible to stop them from getting it. The more restrictions you put in place, the more you'll risk ruining user experience for legitimate users, who will hopefully be the majority of your audience. It also makes code harder to maintain.

I'll post countermeasures to any ideas future answers propose.

like image 142
Aillyn Avatar answered Sep 30 '22 15:09

Aillyn


You can check their ip address against a list of Tor Exit Nodes. I know for a fact this won't even slow someone down who is interested in scraping your site. Tor is too slow, most scrapers won't even consider it. There are tens of thousands of open proxy servers that can be easily scanned for or a list can be purchased. Proxy servers are nice because you can thread them or rotate if your request cap gets hit.

Google has been abused by tor users and most of the exit nodes are on Google black list and thats why you are getting a captcha.

Let me be perfectly clear: THERE IS NOTHING YOU CAN DO TO PREVENT SOMEONE FROM SCRAPING YOUR SITE.

like image 37
rook Avatar answered Sep 30 '22 13:09

rook


By design of the tor network components it is not possible for the receiver to find out if the requester is the original source or if it's just a relayed request.

The behaviour you saw with Google was probably caused by a different security measure. Google detects if a logged-in user changes it's ip and presents a captcha just in case to prevent harmful interception and also allow the continuation of the session if an authenticated user really changed its IP (by re-logon to ISP, etc.).

like image 24
Kosi2801 Avatar answered Sep 30 '22 15:09

Kosi2801