Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cURL returns 404 while the page is found in browser

Tags:

php

curl

there is already similar questions on stackoverflow, but none of their solutions have been working for me. I'm trying to grab a page on LoveIt.com with cURL, but it returns me a 404 error, while the url works fine in the browser :

        $url = 'http://loveit.com/loves/P0D1jlFaIOzzZfZqj_bY3KV';

        $curl = curl_init();
        curl_setopt($curl, CURLOPT_URL, $url);
        curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
        curl_setopt ($curl, CURLOPT_HEADER, false);
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($curl, CURLOPT_REFERER,'http://loveit.com/');

Here's the header I receive :

Array ( [url] => http://loveit.com/loves/P0D1jlFaIOzzZfZqj_bY3KV [content_type] => text/html; charset=utf-8 [http_code] => 404 [header_size] => 667 [request_size] => 172 [filetime] => -1 [ssl_verify_result] => 0 [redirect_count] => 0 [total_time] => 0.320466 [namelookup_time] => 0.000326 [connect_time] => 0.119046 [pretransfer_time] => 0.119089 [size_upload] => 0 [size_download] => 499 [speed_download] => 1557 [speed_upload] => 0 [download_content_length] => 499 [upload_content_length] => 0 [starttransfer_time] => 0.320438 [redirect_time] => 0 [certinfo] => Array ( ) [primary_ip] => --- [primary_port] => 80 [local_ip] => --- [local_port] => 53837 [redirect_url] => )

I read that some website had protections against this kind of scripts; and I did test some solutions proposed, but none worked for me (CURLOPT_USERAGENT,CURLOPT_REFERER...)

Any ideas of what's happening here ?

I would like to backup my LoveIt account, that's why i'm making this (no exports functions and no replies from LoveIt.com about the health of the website)

like image 937
gordie Avatar asked Jul 04 '13 19:07

gordie


2 Answers

I just had a similar issue with a site. In my case they were expecting a USER_AGENT to be set so anyone with this issue in the future should also check that.

like image 26
Dss Avatar answered Oct 02 '22 00:10

Dss


I quickly checked the said page with LiveHeaders enabled and I noticed bunch of cookies set. I suspect that, since it's not "normal" url, you need to hand those cookies while being redirected otherwise you end being kicked out with 404. Use CURLOPT_COOKIEJAR with your cURL instance at start. See: http://php.net/manual/pl/function.curl-setopt.php

like image 170
Marcin Orlowski Avatar answered Oct 02 '22 00:10

Marcin Orlowski