Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP: How to get website with cURL and act like a real browser?

There's a specific website I want to get the source code from with PHP cURL.

Visiting this website with a bowser from my computer works without any problems.

But when I want to access this website with my PHP script, the website recognizes that this is an automated request and shows an error message.

This is my PHP script:

<?php
$url = "https://www.example.com";
$user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.1 Safari/605.1.15";
$header = array('http' => array('user_agent' => $user_agent));

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
curl_close($ch);
echo $data;
?>

The user agent is the same I'm also using with the browser. I'm using a local server with MAMP PRO. This means I'm using the same IP address for both, browser access and PHP script access.

I already tried my PHP script with many different headers and options but nothing worked.

There must be anything that makes a PHP script access look different than a browser access, for the web server I want so access the website from. But what? Do you have an idea?

EDIT

I found out that it's working with this cURL:

curl 'https://www.example.com/' -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36' -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3' -H 'accept-language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7'

If I type this in e.g. the Terminal, it's showing the correct source code.

I converted it to a PHP script as follows:

<?php
$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, 'https://www.example.com/');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');

$headers = array();
$headers[] = 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36';
$headers[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3';
$headers[] = 'Accept-Language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7';
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

$result = curl_exec($ch);
curl_close($ch);
echo $result;
?>

Unfortunately, this way it's still showing the error message.

This means, there must be anything that makes a command line access look different than a browser access, for the web server I want so access the website from. But what is it?

like image 526
David Avatar asked Oct 25 '25 19:10

David


1 Answers

There is no difference between a cURL request and the request that a browser makes, apart from the HTTP headers it requests, and that a browser has JavaScript running on the client.

The only thing that identifies an HTTP client is its headers -- typically the user agent string -- and seeing as you have set the user agent to exactly the same as the browser, there must be other checks in place.

By default, cURL doesn't send any default Accept header, whereas browsers request pages with this header to show the capabilities of the browser. I expect the web server will be checking on something like this.

Copy HTTP request as cURL

Take a look at the screenshot above of Chrome Developer Tools. It allows you to copy the whole request as a cURL request, including all the headers that were sent from Chrome, for testing in the terminal.

Try to match all the headers exactly from within your PHP, and I'm sure the web server will not be able to identify you as a script.

like image 133
Greg Avatar answered Oct 28 '25 08:10

Greg



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!