I'm sure this is fairly simple. I'm using the function below to retrieve sites raw html in order to parse it. during my testing, I decided to run my code on stackoverflow.com
Instead of getting the html response the Chrome is printing out the actual site rather then assigning the html to its veritable. What am I missing?
function get_site_html($site_url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 4);
curl_setopt($ch, CURLOPT_FORBID_REUSE, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_URL, $site_url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
global $base_url;
$base_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
$http_response_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close ($ch);
return $response;
}
The site raw html should be assigned to $response, and then return it.
Your code works. Try echo htmlentities($response);
You'll get the raw html for the site you're curling.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With