Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrape Facebook in Python

I'm interested in getting the number of friends each of my friends on Facebook has. Apparently the official Facebook API does not allow getting the friends of friends, so I need to get around this (somehwhat sensible) limitation somehow. I tried the following:

import sys
import urllib, urllib2, cookielib

username = '[email protected]'
password = 'mypassword'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'email' : username, 'pass' : password})
request = urllib2.Request('https://login.facebook.com/login.php')
request.add_header('User-Agent','Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Fedora/3.6.12-1.fc14 Firefox/3.6.12')
opener.open(request, login_data)
resp = opener.open('http://facebook.com')
print resp.read()

but I only end up with a captcha page. Any idea how FB is detecting that the request is not from a "normal" browser? I could add an extra step and solve the captcha but that would add unnecessary complexity to the program so I would rather avoid it. When I use a web browser with the same User-Agent string I don't get a captcha.

Alternatively, does anyone have any saner ideas on how to accomplish my goal, i.e. get a list of friends of friends?

like image 964
pafcu Avatar asked Nov 28 '10 15:11

pafcu


People also ask

Can Python scrape Facebook?

In order to be able to scrape the Facebook posts, perform the sentiment analysis, download this data into an Excel file and calculate the correlation we will use the following Python modules: Facebook-scraper: to scrape the posts on a Facebook page.

Does Facebook allow scraping?

The act of scraping social media is legal; however, it is not legal to scrape private content without permission and sell it to a third party without user's consent for a profit, in violation of a User Terms of Agreement.


1 Answers

Have you tried tracing and comparing HTTP transactions with Fiddler2 or Wireshark? Fiddler can even trace https, as long as your client code can be made to work with bogus certs.

like image 135
Marcelo Cantos Avatar answered Oct 02 '22 13:10

Marcelo Cantos