Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scraping from a website that requires a login?

Tags:

php

Can this be done if so, how? I want to scrape data from xbox.com but the pages I need to scrape only appear after a successful login.

like image 893
AndrewFerrara Avatar asked Dec 13 '22 15:12

AndrewFerrara


2 Answers

Most login forms will set a cookie. So you should use a HTTP class like Zend_Http that can store them for further requests. It's presumably as simple as:

$client = new Zend_Http_Client();
$client->setCookieJar();   // this is the crucial part for "logging in"

// make login request
$client->setUri("http://xbox.com/login");
$client->setParameterPost("login", "hackz0r");
$result = $client->request('POST');

// go scraping
...
like image 175
mario Avatar answered Dec 15 '22 05:12

mario


You will have to go through the required login transaction by sending POST data with your CURL requests. That said, it is a bad idea to scrape data from behind a login - the site didn't put that information in the public for a reason, and for you to do so might constitute copyright infringement,

like image 20
Chris Baker Avatar answered Dec 15 '22 04:12

Chris Baker