I have been trying to scrape facebook comments using Beautiful Soup on the below website pages.
import BeautifulSoup
import urllib2
import re
url = 'http://techcrunch.com/2012/05/15/facebook-lightbox/'
fd = urllib2.urlopen(url)
soup = BeautifulSoup.BeautifulSoup(fd)
fb_comment = soup("div", {"class":"postText"}).find(text=True)
print fb_comment
The output is a null set. However, I can clearly see the facebook comment is within those above tags in the inspect element of the techcrunch site (I am little new to Python and was wondering if the approach is correct and where I am going wrong?)
Like Christopher and Thiefmaster: it is all because of javascript.
But, if you really need that information, you can still retrieve it thanks to Selenium on http://seleniumhq.org then use beautifulsoup on this output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With