Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

file_get_contents from url that is only accessible after log-in to website

I would like to make a php script that can capture a page from a website. Think file_get_contents($url).

However, this website requires that you fill in a username/password log-in form before you can access any page. I imagine that once logged-in, the website sends your browser an authentication cookie and with every consequent browser request, the session info is passed back to the website to authenticate access.

I want to know how i can simulate this behavior of the browser with a php script in order to gain access and capture a page from this website.

More specifically, my questions are:

  1. How do I send a request that contains my log-in details so that the website replies with the session information/cookie
  2. How do i read the session information/cookie
  3. How do i pass back this session information with every consequent request (file_get_contents, curl) to the website.

Thanks.

like image 305
sincospi Avatar asked Jul 04 '09 14:07

sincospi


People also ask

What is the function file_get_contents () useful for?

The file_get_contents() reads a file into a string. This function is the preferred way to read the contents of a file into a string. It will use memory mapping techniques, if this is supported by the server, to enhance performance.

What is the difference between file_get_contents () function and file () function?

file — Reads entire file contents into an array of lines. file_get_contents — Reads entire file contents into a string.

What does file_get_contents return?

This function is similar to file(), except that file_get_contents() returns the file in a string, starting at the specified offset up to length bytes. On failure, file_get_contents() will return false . file_get_contents() is the preferred way to read the contents of a file into a string.

Does file_get_contents cache?

Short answer: No. file_get_contents is basically just a shortcut for fopen, fread, fclose etc - so I imagine opening a file pointer and freading it isn't cached.


1 Answers

Curl is pretty well suited to do it. You don't need to do anything special other than set the CURLOPT_COOKIEJAR and CURLOPT_COOKIEFILE options. Once you've logged in by passing the form fields from the site the cookie will be saved and Curl will use that same cookie for subsequent requests automatically as the example below illustrates.

Note that the function below saves the cookies to cookies/cookie.txt so make sure that directory/file exists and can be written to.

$loginUrl = 'http://example.com/login'; //action from the login form
$loginFields = array('username'=>'user', 'password'=>'pass'); //login form field names and values
$remotePageUrl = 'http://example.com/remotepage.html'; //url of the page you want to save  

$login = getUrl($loginUrl, 'post', $loginFields); //login to the site

$remotePage = getUrl($remotePageUrl); //get the remote page

function getUrl($url, $method='', $vars='') {
    $ch = curl_init();
    if ($method == 'post') {
        curl_setopt($ch, CURLOPT_POST, 1);
        curl_setopt($ch, CURLOPT_POSTFIELDS, $vars);
    }
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies/cookies.txt');
    curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies/cookies.txt');
    $buffer = curl_exec($ch);
    curl_close($ch);
    return $buffer;
}
like image 200
cOle2 Avatar answered Sep 23 '22 21:09

cOle2