Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I emulate a get request exactly like a web browser?

Tags:

php

curl

There are websites that when I open specific ajax request on browser, I get the resulted page. But when I try to load them with curl, I receive an error from the server.

How can I properly emulate a get request to the server that will simulate a browser?

That is what I am doing:

$url="https://new.aol.com/productsweb/subflows/ScreenNameFlow/AjaxSNAction.do?s=username&f=firstname&l=lastname"; ini_set('user_agent', 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)'); $ch = curl_init();  curl_setopt($ch, CURLOPT_URL,$url); $result=curl_exec($ch); print $result; 
like image 933
ufk Avatar asked Mar 14 '10 00:03

ufk


People also ask

How do I show cURL request in browser?

Open Chrome Developer Tools, go to Network tab, make your request (you may need to check "Preserve Log" if the page refreshes). Find the request on the left, right-click, "Copy as cURL". But cookie in "Copy as cURL" expires within few minutes. At least in case of most of the sites.


2 Answers

Are you sure the curl module honors ini_set('user_agent',...)? There is an option CURLOPT_USERAGENT described at http://docs.php.net/function.curl-setopt.
Could there also be a cookie tested by the server? That you can handle by using CURLOPT_COOKIE, CURLOPT_COOKIEFILE and/or CURLOPT_COOKIEJAR.

edit: Since the request uses https there might also be error in verifying the certificate, see CURLOPT_SSL_VERIFYPEER.

$url="https://new.aol.com/productsweb/subflows/ScreenNameFlow/AjaxSNAction.do?s=username&f=firstname&l=lastname"; $agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';  $ch = curl_init(); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); curl_setopt($ch, CURLOPT_VERBOSE, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_USERAGENT, $agent); curl_setopt($ch, CURLOPT_URL,$url); $result=curl_exec($ch); var_dump($result); 
like image 58
VolkerK Avatar answered Sep 27 '22 19:09

VolkerK


i'll make an example, first decide what browser you want to emulate, in this case i chose Firefox 60.6.1esr (64-bit), and check what GET request it issues, this can be obtained with a simple netcat server (MacOS bundles netcat, most linux distributions bunles netcat, and Windows users can get netcat from.. Cygwin.org , among other places),

setting up the netcat server to listen on port 9999: nc -l 9999

now hitting http://127.0.0.1:9999 in firefox, i get:

$ nc -l 9999 GET / HTTP/1.1 Host: 127.0.0.1:9999 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive Upgrade-Insecure-Requests: 1 

now let us compare that with this simple script:

<?php $ch=curl_init("http://127.0.0.1:9999"); curl_exec($ch); 

i get:

$ nc -l 9999 GET / HTTP/1.1 Host: 127.0.0.1:9999 Accept: */* 

there are several missing headers here, they can all be added with the CURLOPT_HTTPHEADER option of curl_setopt, but the User-Agent specifically should be set with CURLOPT_USERAGENT instead (it will be persistent across multiple calls to curl_exec() and if you use CURLOPT_FOLLOWLOCATION then it will persist across http redirections as well), and the Accept-Encoding header should be set with CURLOPT_ENCODING instead (if they're set with CURLOPT_ENCODING then curl will automatically decompress the response if the server choose to compress it, but if you set it via CURLOPT_HTTPHEADER then you must manually detect and decompress the content yourself, which is a pain in the ass and completely unnecessary, generally speaking) so adding those we get:

<?php $ch=curl_init("http://127.0.0.1:9999"); curl_setopt_array($ch,array(         CURLOPT_USERAGENT=>'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0',         CURLOPT_ENCODING=>'gzip, deflate',         CURLOPT_HTTPHEADER=>array(                 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',                 'Accept-Language: en-US,en;q=0.5',                 'Connection: keep-alive',                 'Upgrade-Insecure-Requests: 1',         ), )); curl_exec($ch); 

now running that code, our netcat server gets:

$ nc -l 9999 GET / HTTP/1.1 Host: 127.0.0.1:9999 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0 Accept-Encoding: gzip, deflate Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.5 Connection: keep-alive Upgrade-Insecure-Requests: 1 

and voila! our php-emulated browser GET request should now be indistinguishable from the real firefox GET request :)

this next part is just nitpicking, but if you look very closely, you'll see that the headers are stacked in the wrong order, firefox put the Accept-Encoding header in line 6, and our emulated GET request puts it in line 3.. to fix this, we can manually put the Accept-Encoding header in the right line,

<?php $ch=curl_init("http://127.0.0.1:9999"); curl_setopt_array($ch,array(         CURLOPT_USERAGENT=>'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0',         CURLOPT_ENCODING=>'gzip, deflate',         CURLOPT_HTTPHEADER=>array(                 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',                 'Accept-Language: en-US,en;q=0.5',                 'Accept-Encoding: gzip, deflate',                 'Connection: keep-alive',                 'Upgrade-Insecure-Requests: 1',         ), )); curl_exec($ch); 

running that, our netcat server gets:

$ nc -l 9999 GET / HTTP/1.1 Host: 127.0.0.1:9999 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive Upgrade-Insecure-Requests: 1 

problem solved, now the headers is even in the correct order, and the request seems to be COMPLETELY INDISTINGUISHABLE from the real firefox request :) (i don't actually recommend this last step, it's a maintenance burden to keep CURLOPT_ENCODING in sync with the custom Accept-Encoding header, and i've never experienced a situation where the order of the headers are significant)

like image 21
hanshenrik Avatar answered Sep 27 '22 18:09

hanshenrik