Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cURL - How to fetch page only if it has changed since last fetch?

I have a script that fetch pages everyday and I want to fetch it only if content changed, so that script will run faster and less traffic will be used.

My idea is to fetch header first and compare content-length so that if its differ we fetch whole document, but it's not too much precise, because website could have dynamic parts that makes content-length every time different.

Is there another way, like using some sort of DNS or anything else?

like image 460
Kref Avatar asked Apr 30 '16 07:04

Kref


2 Answers

I looked for answer for more than 2 days, and nobody could give me universal answer.

So I implemented etag and if-modified-since headers (as Matt Raines and sowa posts here), also to lower traffic I used compression like gzip.

Also there is request header Range, so that i could grap only part of the page as someone told me, but i think it is used only for files not web pages.

Thank you all for your time

like image 187
Kref Avatar answered Sep 25 '22 22:09

Kref


Update local file with remote, iff remote is newer

Cut and paste answer for those who want to
check if a remote file is more up to date than a local one, and update the local file if so:

    // $remotePath = 'http://blahblah.com/file.ext'; 
    // $localPath = '/usr/whatever/app/file.ext';

    $headers = get_headers( $remotePath , 1 );
    $remote_mod_date = strtotime( $headers['Last-Modified'] );
    $local_mod_date = filemtime( $localPath );

    if ( $local_mod_date >= $remote_mod_date ) {
        // Local version up to date 
    } else {
        // Remote file is newer
        $ch = curl_init();

        curl_setopt($ch, CURLOPT_URL, $remotePath);
        // other options here, eg: curl_setopt($ch, CURLOPT_SSLVERSION, CURL_SSLVERSION_TLSv1_2);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

        $result = curl_exec($ch);

        if (curl_errno($ch)) {
            // handle error : curl_error($ch) 
        }

        curl_close ($ch);

        if ( $result ) {
            // Update local file with remote file contents
            file_put_contents( $localPath, $result );
        } 
    }

With thanks to OP question here, and also this answer.
Created to solve automatic OIDC CA cert renewal (this, and this).

like image 38
kris Avatar answered Sep 22 '22 22:09

kris