Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I send cookies using PHP curl in addition to CURLOPT_COOKIEFILE?

I am scraping some content from a website after a form submission. The problem is that the script is failing every now and then, say 2 times out of 5 the script fails. I am using php curl, COOKIEFILE and COOKIEJAR to handle the cookie. However when I observed the sent headers of my browser (when visiting the target website from my browser and using live http headers) and the headers sent by php and saw there are many differences.

My browser sent a lot more cookie variables than php curl. I think this difference might be because javascript is resposible for setting most of the cookies, however I'm not sure about this.

I am using the below code to do the scraping and I am showing the sent headers of my browser and of php curl:

$ckfile = tempnam ("/tmp", 'cookiename');  $url = 'https://www.domain.com/firststep'; $poststring = 'variable1=4&variable2=5'; $ch = curl_init ($url); curl_setopt ($ch, CURLOPT_COOKIEJAR, $ckfile); curl_setopt ($ch, CURLOPT_COOKIEFILE, $ckfile); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt ($ch, CURLOPT_POST, 1); curl_setopt ($ch, CURLOPT_POSTFIELDS, $poststring); $output = curl_exec ($ch); curl_close($ch);    $url = 'https://www.domain.com/nextstep'; $poststring = 'variableB1=4&variableB2=5'; $ch = curl_init ($url); curl_setopt ($ch, CURLOPT_COOKIEJAR, $ckfile); curl_setopt ($ch, CURLOPT_COOKIEFILE, $ckfile); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt ($ch, CURLOPT_POST, 1); curl_setopt ($ch, CURLOPT_POSTFIELDS, $poststring); curl_setopt($ch, CURLINFO_HEADER_OUT, true); $output = curl_exec ($ch); $headers = curl_getinfo($ch, CURLINFO_HEADER_OUT); curl_close($ch);  print_r($headers);  // Gives: POST /d-cobs-web/doffers.html;jsessionid=7BC2A5277A4EB07D9A7237A707BE1366 HTTP/1.1 User-Agent: Mozilla Host: domain.subdomain.nl Accept: */* Cookie: JSESSIONID=7BC2A5277A4EB07D9A7237A707BE1366; www-20480=MIFBNLFDFAAA Content-Length: 187 Content-Type: application/x-www-form-urlencoded  // Where live http headers gives: POST /d-cobs-web/doffers.html;jsessionid=7BC2A5277A4EB07D9A7237A707BE1366 HTTP/1.1 Host: domain.subdomain.nl User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: nl,en-us;q=0.7,en;q=0.3 Accept-Encoding: gzip, deflate Content-Type: application/x-www-form-urlencoded; charset=UTF-8 Referer: https://domain.subdomain.nl/dd/doffers.html?returnUrl=https%3A%2F%2Fttcc.subdomain.nl%2Fdd%2Fpreferences.html%3FValueChanged%3Dfalse&BEGBA=&departureDate=13-06-2013&extChangeTime=&pax2=0&bp=&pax1=1&pax4=0&bk=&pax3=0&shopId=&xtpage=&partner=NSINT&bc=&xt_pc=&ov=&departureTime=&comfortClass=2&destination=DEBHF&thalysTicketless=&beneUser=&debugDOffer=&logonId=&valueChanged=&iDomesticOrigin=&rp=&returnTime=&locale=nl_NL&vu=&thePassWeekend=false&returnDate=&xtsite=&pax=A&lc2=&lc1=&lc4=&lc3=&lc6=&lc5=&BECRA=&passType2=&custId=&lc9=&iDomesticDestination=&passType1=A&lc7=&lc8=&origin=NLASC&toporef=&pid=&passType4=&returnTimeType=1&passType3=&departureTimeType=1&socusId=&idr3=&xtn2=&loyaltyCard=&idr2=&idr1=&thePassBusiness=false&cid=14812 Content-Length: 219 Cookie: subdomainPARTNER=NSINT; JSESSIONID=CB3FEB3AC72AD61A80BFED91D3FD96CA; www-20480=MHFBNLFDFAAA; campaignPos=5; www-47873=MGFBNLFDFAAA; __utma=1.993399624.1370027094.1370040145.1370082133.5; __utmc=1; __utmz=1.1370027094.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); BCSessionID=5dc05787-c2c8-43e1-9abe-93989970b087; BCPermissionLevel=PERSONAL; __utmb=1.1.10.1370082133 Connection: keep-alive Pragma: no-cache Cache-Control: no-cache AJAXREQUEST=_viewRoot&doffersForm=doffersForm&doffersForm%3AvalueChanged=&doffersForm%3ArequestValid=true&javax.faces.ViewState=j_id3&doffersForm%3Aj_id937=doffersForm%3Aj_id937&valueChanged=false&AJAX%3AEVENTS_COUNT=1& 

I would like to use:

$headers   = array(); $headers[] = 'Cookie: ' . $cookie; 

and:

curl_setopt($ch, CURLOPT_HTTPHEADER, $headers); 

where:

$cookie = 'subdomainPARTNER=NSINT; JSESSIONID=CB3FEB3AC72AD61A80BFED91D3FD96CA; www-20480=MHFBNLFDFAAA; campaignPos=5; www-47873=MGFBNLFDFAAA; __utma=1.993399624.1370027094.1370040145.1370082133.5; __utmc=1; __utmz=1.1370027094.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); BCSessionID=5dc05787-c2c8-43e1-9abe-93989970b087; BCPermissionLevel=PERSONAL; __utmb=1.1.10.1370082133'; 

Some of the parameters in the cookie above I might be able to scrape from the content of the website, but not all. Some of them I might be able to read from the $ckfile, but I don't know how to do that. Especially the utma utmc, utmz, utmcsr, utmccn, utmcmd I am not able to get from anywhere, I think these are generated by the javascript.

Question 1: Am I doing something wrong with the cookie handling in the current code as very few cookie variables are sent by php curl and a lot more by the browser? Further: can other differences between sent headers by browser and php curl be a problem to return the right content?

Question 2: Are the missing cookie variables due to the javascript setting those cookies?

Question 3: What is the best way to handle the cookies to make sure that all required cookies are being sent to the remote server?

Your help is very welcome!

like image 675
BastiaanWW Avatar asked Jun 01 '13 11:06

BastiaanWW


People also ask

How do you send cookies with curls?

By default, curl doesn't send any cookies but you can add your own cookies via the -b 'name=value' command line argument. To save cookies from the response to a file, use the -c file option. To load cookies from a file, use the -b file option.

How do you curl multiple cookies?

To send cookies to the server, you need to add the "Cookie: name=value" header to your request. To send multiple Cookies in one cookie header, you can separate them with semicolons.

How does PHP handle curl request?

php file with the following contents. $url = 'https://www.example.com' ; $curl = curl_init(); curl_setopt( $curl , CURLOPT_URL, $url );

What is Curlopt_verbose?

When CURLOPT_VERBOSE is set, output is written to STDERR or the file specified using CURLOPT_STDERR . The output is very informative. You can also use tcpdump or wireshark to watch the network traffic.


2 Answers

If the cookie is generated from script, then you can send the cookie manually along with the cookie from the file(using cookie-file option). For example:

# sending manually set cookie curl_setopt($ch, CURLOPT_HTTPHEADER, array("Cookie: test=cookie"));  # sending cookies from file curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile); 

In this case curl will send your defined cookie along with the cookies from the file.

If the cookie is generated through javascrript, then you have to trace it out how its generated and then you can send it using the above method(through http-header).

The utma utmc, utmz are seen when cookies are sent from Mozilla. You shouldn't bet worry about these things anymore.

Finally, the way you are doing is alright. Just make sure you are using absolute path for the file names(i.e. /var/dir/cookie.txt) instead of relative one.

Always enable the verbose mode when working with curl. It will help you a lot on tracing the requests. Also it will save lot of your times.

curl_setopt($ch, CURLOPT_VERBOSE, true); 
like image 199
Sabuj Hassan Avatar answered Sep 24 '22 09:09

Sabuj Hassan


Here is a list of examples for sending cookies - https://github.com/andriichuk/php-curl-cookbook#cookies

$curlHandler = curl_init();  curl_setopt_array($curlHandler, [ CURLOPT_URL => 'https://httpbin.org/cookies', CURLOPT_RETURNTRANSFER => true,  CURLOPT_COOKIEFILE  => $cookieFile, CURLOPT_COOKIE => 'foo=bar;baz=foo',  /**  * Or set header  * CURLOPT_HTTPHEADER => [        'Cookie: foo=bar;baz=foo',    ]  */ ]);  $response = curl_exec($curlHandler); curl_close($curlHandler);  echo $response; 
like image 33
Serhii Andriichuk Avatar answered Sep 22 '22 09:09

Serhii Andriichuk