Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get mechanize requests to look like they originate from a real browser

OK, here's the header(just an example) info I got from Live HTTP Header while logging into an account:

http://example.com/login.html

POST /login.html HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 GTB7.1 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Referer: http://example.com
Cookie: blahblahblah; blah = blahblah
Content-Type: application/x-www-form-urlencoded
Content-Length: 39
username=shane&password=123456&do=login

HTTP/1.1 200 OK
Date: Sat, 18 Dec 2010 15:41:02 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: PHP/5.2.14
Set-Cookie: blah = blahblah_blah; expires=Sun, 18-Dec-2011 15:41:02 GMT; path=/; domain=.example.com; HttpOnly
Set-Cookie: blah = blahblah; expires=Sun, 18-Dec-2011 15:41:02 GMT; path=/; domain=.example.com; HttpOnly
Set-Cookie: blah = blahblah; expires=Sun, 18-Dec-2011 15:41:02 GMT; path=/; domain=.example.com; HttpOnly
Cache-Control: private, no-cache="set-cookie"
Expires: 0
Pragma: no-cache
Content-Encoding: gzip
Vary: Accept-Encoding
Content-Length: 4135
Keep-Alive: timeout=10, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=UTF-8

Normally I would code like this:

import mechanize
import urllib2

MechBrowser = mechanize.Browser()
LoginUrl = "http://example.com/login.html"
LoginData = "username=shane&password=123456&do=login"
LoginHeader = {"User-Agent": "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 GTB7.1 (.NET CLR 3.5.30729)", "Referer": "http://example.com"}

LoginRequest = urllib2.Request(LoginUrl, LoginData, LoginHeader)
LoginResponse = MechBrowser.open(LoginRequest)

Above code works fine. My question is, do I also need to add these following lines (and more in previous header infos) in LoginHeader to make it really looks like firefox's surfing, not mechanize?

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

What parts/how many of header info need to be spoofed to make it looks "real"?

like image 340
Shane Avatar asked Jan 07 '11 05:01

Shane


People also ask

Does mechanize use a real browser?

mechanize doesn't use real browsers - it is a tool for programmatic web-browsing.

What does mechanical soup do?

MechanicalSoup is designed to simulate the behavior of a human using a web browser. Possible use-case include: Interacting with a website that doesn't provide a webservice API, out of a browser. Testing a website you're developing.

What is mechanize in Python?

The mechanize module in Python is similar to perl WWW:Mechanize. It gives you a browser like object to interact with web pages. Here is an example on how to use it in a program.


2 Answers

It depends on what you're trying to 'fool'. You can try some online services that do simple User Agent sniffing to gauge your success:

http://browserspy.dk/browser.php

http://www.browserscope.org (look for 'We think you're using...')

http://www.browserscope.org/ua

http://panopticlick.eff.org/ -> will help you to pick some 'too common to track' options

http://networking.ringofsaturn.com/Tools/browser.php

I believe a determined programmer could detect your game, but many log parsers and tools wouldn't once you echo what your real browser sends.

One thing you should consider is that lack of JS might raise red flags, so capture sent headers with JS disabled too.

like image 178
TryPyPy Avatar answered Oct 22 '22 19:10

TryPyPy


Here's how you set the user agent for all requests made by mechanize.Browser

br = mechanize.Browser()
br.addheaders = [('User-agent', 'your user agent string here')]

Mechanize can fill in forms as well

br.open('http://yoursite.com/login')
br.select_form(nr=1) # select second form in page (0 indexed)
br['username'] = 'yourUserName' # inserts into form field with name 'username'
br['password'] = 'yourPassword'
response = br.submit()
if 'Welcome yourUserName' in response.get_data():
    # login was successful
else:
    # something went wrong
    print response.get_data()

See the mechanize examples for more info

like image 6
cerberos Avatar answered Oct 22 '22 17:10

cerberos