How to detect inbound HTTP requests sent anonymously via Tor?

Tags:

I'm developing a website and am sensitive to people screen scraping my data. I'm not worried about scraping one or two pages -- I'm more concerned about someone scraping thousands of pages as the aggregate of that data is much more valuable than a small percentage would be.

I can imagine strategies to block users based on heavy traffic from a single IP address, but the Tor network sets up many circuits that essentially mean a single user's traffic appears to come from different IP addresses over time.

I know that it is possible to detect Tor traffic as when I installed Vidalia with its Firefox extension, google.com presented me with a captcha.

So, how can I detect such requests?

(My website's in ASP.NET MVC 2, but I think any approach used here would be language independent)

965

asked Sep 10 '10 19:09

Drew Noakes

3 Answers

I'm developing a website and am sensitive to people screen scraping my data

Forget about it. If it's on the web and someone wants it, it will be impossible to stop them from getting it. The more restrictions you put in place, the more you'll risk ruining user experience for legitimate users, who will hopefully be the majority of your audience. It also makes code harder to maintain.

I'll post countermeasures to any ideas future answers propose.

142

answered Sep 30 '22 15:09

Aillyn

You can check their ip address against a list of Tor Exit Nodes. I know for a fact this won't even slow someone down who is interested in scraping your site. Tor is too slow, most scrapers won't even consider it. There are tens of thousands of open proxy servers that can be easily scanned for or a list can be purchased. Proxy servers are nice because you can thread them or rotate if your request cap gets hit.

Google has been abused by tor users and most of the exit nodes are on Google black list and thats why you are getting a captcha.

Let me be perfectly clear: THERE IS NOTHING YOU CAN DO TO PREVENT SOMEONE FROM SCRAPING YOUR SITE.

answered Sep 30 '22 13:09

rook

By design of the tor network components it is not possible for the receiver to find out if the requester is the original source or if it's just a relayed request.

The behaviour you saw with Google was probably caused by a different security measure. Google detects if a logged-in user changes it's ip and presents a captcha just in case to prevent harmful interception and also allow the continuation of the session if an authenticated user really changed its IP (by re-logon to ISP, etc.).

answered Sep 30 '22 15:09

Kosi2801

Related questions
                            
                                RSA Signing and verifying in java
                            
                                java.lang.SecurityException: Permission Denial: starting Intent { act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER]
                            
                                Do file_get_contents and readfile execute PHP code?
                            
                                SQL injection vulnerable code even when we are sanitizing the input mysql_real_escape_string
                            
                                Can I make my ASP.NET FormsAuthentication cookie more secure by associating it with the session ID?
                            
                                Flash Security.AllowDomain()
                            
                                Compare password hashes between C# and ColdFusion (CFMX_COMPAT)
                            
                                Is Erlang's security via cookies enough?
                            
                                How to protect streaming videos from download
                            
                                Escaping PHP GET and POST values [duplicate]
                            
                                How can I configure two Spring security http elements with different authentication filters?
                            
                                EventSource / SSE (Server-Sent-Svents) - Security
                            
                                Spring Security for URL with permitAll() and expired Auth Token
                            
                                Understanding .NET's "SecurityAction" parameter for permissions
                            
                                Captcha image verification: in C#.net and asp.net [closed]
                            
                                Most secure way to generate a random session ID for a cookie?
                            
                                Signs that a SQL statement is dangerous
                            
                                mysqli_stmt_bind_param SQL Injection
                            
                                Why aren't original passwords stored?
                            
                                Sandboxing Users' PHP Code

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to detect inbound HTTP requests sent anonymously via Tor?

Tags:

security

tor

denial-of-service

network-security