I have a script that fetch pages everyday and I want to fetch it only if content changed, so that script will run faster and less traffic will be used.
My idea is to fetch header first and compare content-length so that if its differ we fetch whole document, but it's not too much precise, because website could have dynamic parts that makes content-length every time different.
Is there another way, like using some sort of DNS or anything else?
I looked for answer for more than 2 days, and nobody could give me universal answer.
So I implemented etag and if-modified-since headers (as Matt Raines and sowa posts here), also to lower traffic I used compression like gzip.
Also there is request header Range, so that i could grap only part of the page as someone told me, but i think it is used only for files not web pages.
Thank you all for your time
Cut and paste answer for those who want to
check if a remote file is more up to date than a local one, and update the local file if so:
// $remotePath = 'http://blahblah.com/file.ext';
// $localPath = '/usr/whatever/app/file.ext';
$headers = get_headers( $remotePath , 1 );
$remote_mod_date = strtotime( $headers['Last-Modified'] );
$local_mod_date = filemtime( $localPath );
if ( $local_mod_date >= $remote_mod_date ) {
// Local version up to date
} else {
// Remote file is newer
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $remotePath);
// other options here, eg: curl_setopt($ch, CURLOPT_SSLVERSION, CURL_SSLVERSION_TLSv1_2);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
if (curl_errno($ch)) {
// handle error : curl_error($ch)
}
curl_close ($ch);
if ( $result ) {
// Update local file with remote file contents
file_put_contents( $localPath, $result );
}
}
With thanks to OP question here, and also this answer.
Created to solve automatic OIDC CA cert renewal (this, and this).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With