I am trying to fetch facebook a user's profile page using "wget" but keep getting a non-profile page called "browser.php" which has nothing to do with that particular user. The profile page's URL as I see in the browser happens to be of the following format: http://www.facebook.com/user-name and that's what I have been using as the argument to the wget command: wget http://www.facebook.com/user-name I am also interested in using wget to fetch a user's friends' list but even that is giving me the same unhelpful result ("browser.php"): wget http://www.facebook.com/user-name?sk=friends&v=friends Could someone kindly advise me what I'm doing wrong here? In other words, am I missing out some key options for wget command or does wget not fit such a scenario at all? Any help will be greatly appreciated. To add context to this query, I need to figure out how to fetch these pages from Facebook using wget as it would then help me write a script/program to look up friends' profile URLs from the HTML source code and then look up some other keywords on them, etc. I am basically hoping that this would help me in doing some kind of selective-crawling (with Facebook's permission of course) of people I am not connected to.

First, Facebook have probably created a condition where certain user agents (e.g. wget) cannot crawl the pages. So they redirect certain user agents yo a different page which would probably say something like "your browser is not supported" They do that to protect people from doing exactly what you are doing. However you can tell wget to identify itself as a different agent using <code>-U</code> argument to wget (read the wget man page). e.g. <code>wget -U Mozilla http://....</code> Second, Facebooks privacy setting rarely allows you to read any/much information unless you are logged in as a user, and probably only as a user who is friend to the profile you are trying to scrape. Thridly, there is an Facebook API which you need to use to crawl and extract information from facebook -- you are likely in violation of the Acceptable Use policy if you try to obtain information in any other way.

I donno why you want to use wget ..facebook offers an excellent API . <pre class="prettyprint"><code>wget --user-agent=Firefox http://www.facebook.com/markzuckerberg </code></pre> will save the publicly available content to a file. you should consider using their API. Facebook Developers

If you want to save the logged in page, you can log in with Firefox with "Keep me logged in" selected, then copy those cookies to a file and use them with the cookiejar option. You will still have quite a bit of dynamic script loaded content that WGET isn't going to save. There's many ways to skin this cat. If you need to extract a specific item, check out the API. If you're simply wanting to archive a snapshot of the page as it would appear in a web browser, try CutyCapt. It's much like wget, except it parses the entire document as a web broswer would and stores an image of the page.

Check the following open-source projects: <ul> <li> <code>facebook-cli</code>, it's a command-line utility to interact with the Facebook API.</li> <li> <code>facebook-friends</code> which can generate an HTML page of all of your Facebook friends.</li> </ul>

wget for fetching Facebook profile/friend pages

Tags:

facebook

wget

web-crawler

user-profile

I am trying to fetch facebook a user's profile page using "wget" but keep getting a non-profile page called "browser.php" which has nothing to do with that particular user. The profile page's URL as I see in the browser happens to be of the following format:

http://www.facebook.com/user-name

and that's what I have been using as the argument to the wget command:

wget http://www.facebook.com/user-name

I am also interested in using wget to fetch a user's friends' list but even that is giving me the same unhelpful result ("browser.php"):

wget http://www.facebook.com/user-name?sk=friends&v=friends

Could someone kindly advise me what I'm doing wrong here? In other words, am I missing out some key options for wget command or does wget not fit such a scenario at all?

Any help will be greatly appreciated.

To add context to this query, I need to figure out how to fetch these pages from Facebook using wget as it would then help me write a script/program to look up friends' profile URLs from the HTML source code and then look up some other keywords on them, etc. I am basically hoping that this would help me in doing some kind of selective-crawling (with Facebook's permission of course) of people I am not connected to.

467

asked Jul 25 '11 20:07

rogerchucker

4 Answers

First, Facebook have probably created a condition where certain user agents (e.g. wget) cannot crawl the pages. So they redirect certain user agents yo a different page which would probably say something like "your browser is not supported" They do that to protect people from doing exactly what you are doing. However you can tell wget to identify itself as a different agent using -U argument to wget (read the wget man page). e.g. wget -U Mozilla http://....

Second, Facebooks privacy setting rarely allows you to read any/much information unless you are logged in as a user, and probably only as a user who is friend to the profile you are trying to scrape.

Thridly, there is an Facebook API which you need to use to crawl and extract information from facebook -- you are likely in violation of the Acceptable Use policy if you try to obtain information in any other way.

196

answered Sep 23 '22 14:09

Soren

I donno why you want to use wget ..facebook offers an excellent API .

wget --user-agent=Firefox http://www.facebook.com/markzuckerberg

will save the publicly available content to a file.

you should consider using their API.

Facebook Developers

answered Sep 23 '22 14:09

Vamsi Krishna B

If you want to save the logged in page, you can log in with Firefox with "Keep me logged in" selected, then copy those cookies to a file and use them with the cookiejar option. You will still have quite a bit of dynamic script loaded content that WGET isn't going to save.

There's many ways to skin this cat. If you need to extract a specific item, check out the API. If you're simply wanting to archive a snapshot of the page as it would appear in a web browser, try CutyCapt. It's much like wget, except it parses the entire document as a web broswer would and stores an image of the page.

answered Sep 26 '22 14:09

David

Check the following open-source projects:

facebook-cli, it's a command-line utility to interact with the Facebook API.
facebook-friends which can generate an HTML page of all of your Facebook friends.

answered Sep 22 '22 14:09

kenorb

Related questions
                            
                                How to update/upgrade a Facebook App API version on Facebook's Developers Panel? I even need to do it?
                            
                                Facebook Instant Verification is not verifying mobile number via facebook app
                            
                                How to obtain the live video id of a facebook live video?
                            
                                Facebook authentication opening tab instead of popup in Chrome 59
                            
                                Does Instagram Block CDN URL requests from Some IPs?
                            
                                Android - Facebook SDK 4.30 Initialization
                            
                                Android: Open SMS Intent
                            
                                How much Active users needed to meet Facebook audience network quality checks?
                            
                                How to generate a URL for chatting with a user in a Facebook page using the page scoped id PSID
                            
                                Set User Status with "@mentions" integration
                            
                                Beginner's guide to Facebook Apps [closed]
                            
                                How do I use Perl's WWW::Facebook::API to publish to a user's newsfeed?
                            
                                How does Facebook handle Privacy settings in database side?
                            
                                How to get real-time status notification with Facebook SDK via listener
                            
                                FaceBook API: Get the Request Object for a request Id - logged into the account that sent the request. Using the "Requests Dialog" API
                            
                                Facebook SSO example not working - "An error ocurred. Please try again later"
                            
                                Facebook Authentication working on Emulator, But not on physical device
                            
                                Facebook Error in Android "Login failed: com.facebook.android.DialogError: The connection to the server was unsuccessful"
                            
                                Retrieving my Facebook fan page wall posts via PHP now gives "An access token is required to request this resource."
                            
                                Facebook App via PHP SDK - Redirecting back to page where someone added my app after getting permissions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With