Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cURL and redirects - returning multiple headers?

I'm writing a specialized PHP proxy and got stumped by a feature of cURL.

If the following values are set:

curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $ch, CURLOPT_HEADER, true );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );

cURL correctly handles redirects, but returns ALL page headers, not just the final (non-redirect) page, e.g.

HTTP/1.1 302 Found
Location: http://otherpage
Set-Cookie: someCookie=foo
Content-Length: 198

HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Content-Length: 3241

<!DOCTYPE HTML>
...rest of content

Note that CURLOPT_HEADER is set because I need to read and copy parts of the original header into my proxy header.

I appreciate why it's returning all these headers (for example, my proxy code must detect any cookies set in the 302 header and pass them along). HOWEVER, it also makes it impossible to detect when the headers end and the content begins. Normally, with one header we could just do a simple split:

$split = preg_split('/\r\n\r\n/', $fullPage, 2)

But that obviously won't work here. Hm. We could try something that only splits if it looks like the next line is part of a header:

$split = preg_split('/\r\n\r\nHTML\/(1\.0|1\.1) \\d+ \\w+/', $fullPage)
// matches patterns such a "\r\n\r\nHTML/1.1 302 Found"

Which will work almost all the time, but chokes if someone has the following in their page:

...and for all you readers out there, here is an example HTTP header:
<PRE>

HTTP/1.1 200 OK

BALLS!

We really want the split to stop matching as soon as it encounters any pattern of \r\n\r\n that isn't immediately followed by HTML/1.x - is there a way to do this with PHP RegExs? Even this solution can choke on the (quite rare) situation where someone puts an HTTP header right at the beginning of their content. Is there a way in cURL to get all of the returned pages as an array?

like image 300
Ender Avatar asked Oct 25 '10 19:10

Ender


2 Answers

You can get the information of the total header size, and split the string up like this:

$buffer = curl_exec($ch);
$curl_info = curl_getinfo($ch);
curl_close($ch);
$header_size = $curl_info["header_size"];
$header = substr($buffer, 0, $header_size);
$body = substr($buffer, $header_size)

Information taken from the helpful post by "grandpa".

like image 131
Conspicuous Compiler Avatar answered Oct 15 '22 05:10

Conspicuous Compiler


use curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);

TRUE to follow any "Location: " header that the server sends as part of the HTTP header (note this is recursive, PHP will follow as many "Location: " headers that it is sent, unless CURLOPT_MAXREDIRS is set).

like image 34
Rijesh Np Avatar answered Oct 15 '22 05:10

Rijesh Np