Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

file_get_contents getting wrong results

Update

I solved the problem and posted an answer. However, my solution isn't 100% ideal. I would much rather only remove the symlink from the cache with clearstatcache(true, $target) or clearstatcache(true, $link) but that doesn't work.

I would also much rather prevent the caching of symlinks in the first place or remove the symlink from the cache immediately after generating it. Unfortunately, I had no luck with that. For some reason clearstatcache(true) after creating a symlink does not work, it still gets cached.

I will happily award the bounty to anyone that can improve my answer and solve those issues.

Edit

I've attempted to optimize my code by generating a file everytime clearstatcache is run, so that I only need to clear the cache once for each symlink. For some reason, this does not work. clearstatcache needs to be called every time a symlink is including in the path, but why? There must be a way to optimize the solution I have.


I am using PHP 7.3.5 with nginx/1.16.0. Sometimes file_get_contents returns the wrong value when using a symlink. The problem is after deleting and recreating a symlink, its old value remains in the cache. Sometimes the correct value is returned, sometimes the old value. It appears random.

I've tried to clear the cache or prevent caching with:

function symlink1($target, $link)
{
    realpath_cache_size(0);
    symlink($target, $link);
    //clearstatcache(true);
}

I don't really want to disable caching but I still need 100% accuracy with file_get_contents.

Edit

I am unable to post my source code, as it is way too long and complex, so I have created a minimal, reproducible example (index.php) that recreates the problem:

<h1>Symlink Problem</h1>
<?php
    $dir = getcwd();
    if (isset($_POST['clear-all']))
    {
        $nos = array_values(array_diff(scandir($dir.'/nos'), array('..', '.')));
        foreach ($nos as $no)
        {
            unlink($dir.'/nos/'.$no.'/id.txt');
            rmdir($dir.'/nos/'.$no);
        }
        foreach (array_values(array_diff(scandir($dir.'/ids'), array('..', '.'))) as $id)
            unlink($dir.'/ids/'.$id);
    }
    if (!is_dir($dir.'/nos'))
        mkdir($dir.'/nos');
    if (!is_dir($dir.'/ids'))
        mkdir($dir.'/ids');
    if (isset($_POST['submit']) && !empty($_POST['id']) && ctype_digit($_POST['insert-after']) && ctype_alnum($_POST['id']))
    {
        $nos = array_values(array_diff(scandir($dir.'/nos'), array('..', '.')));
        $total = count($nos);
        if ($total <= 100)
        {
            for ($i = $total; $i >= $_POST['insert-after']; $i--)
            {
                $id = file_get_contents($dir.'/nos/'.$i.'/id.txt');
                unlink($dir.'/ids/'.$id);
                symlink($dir.'/nos/'.($i + 1), $dir.'/ids/'.$id);
                rename($dir.'/nos/'.$i, $dir.'/nos/'.($i + 1));
            }
            echo '<br>';
            mkdir($dir.'/nos/'.$_POST['insert-after']);
            file_put_contents($dir.'/nos/'.$_POST['insert-after'].'/id.txt', $_POST['id']);
            symlink($dir.'/nos/'.$_POST['insert-after'], $dir.'/ids/'.$_POST['id']);
        }
    }
    $nos = array_values(array_diff(scandir($dir.'/nos'), array('..', '.')));
    $total = count($nos) + 1;
    echo '<h2>Ids from nos directory</h2>';
    foreach ($nos as $no)
    {
        echo ($no + 1).':'.file_get_contents("$dir/nos/$no/id.txt").'<br>';
    }
    echo '<h2>Ids from using symlinks</h2>';
    $ids = array_values(array_diff(scandir($dir.'/ids'), array('..', '.')));
    if (count($ids) > 0)
    {
        $success = true;
        foreach ($ids as $id)
        {
            $id1 = file_get_contents("$dir/ids/$id/id.txt");
            echo $id.':'.$id1.'<br>';
            if ($id !== $id1)
                $success = false;
        }
        if ($success)
            echo '<b><font color="blue">Success!</font></b><br>';
        else
            echo '<b><font color="red">Failure!</font></b><br>';
    }
?>
<br>
<h2>Insert ID after</h2>
<form method="post" action="/">
    <select name="insert-after">
        <?php
            for ($i = 0; $i < $total; $i++)
                echo '<option value="'.$i.'">'.$i.'</option>';
        ?>
    </select>
    <input type="text" placeholder="ID" name="id"><br>
    <input type="submit" name="submit" value="Insert"><br>
</form>
<h2>Clear all</h2>
<form method="post" action="/">
    <input type="submit" name="clear-all" value="Clear All"><br>
</form>
<script>
    if (window.history.replaceState)
    {
        window.history.replaceState( null, null, window.location.href );
    }
</script>

It seemed very likely to be a problem with Nginx configuration. Not having these lines can cause the problem:

fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
fastcgi_param DOCUMENT_ROOT $realpath_root;

Here is my Nginx configuration (you can see I have included the above lines):

server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;
    server_name www.websemantica.co.uk;
    root "/path/to/site/root";
    index index.php;

    location / {
        try_files $uri $uri/ $uri.php$is_args$query_string;
    }

    location ~* \.php$ {
        try_files $uri =404;
        fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock;
        fastcgi_param   QUERY_STRING            $query_string;
        fastcgi_param   REQUEST_METHOD          $request_method;
        fastcgi_param   CONTENT_TYPE            $content_type;
        fastcgi_param   CONTENT_LENGTH          $content_length;

        fastcgi_param   SCRIPT_FILENAME         $realpath_root$fastcgi_script_name;
        fastcgi_param   SCRIPT_NAME             $fastcgi_script_name;
        fastcgi_param   PATH_INFO               $fastcgi_path_info;
        fastcgi_param   PATH_TRANSLATED         $realpath_root$fastcgi_path_info;
        fastcgi_param   REQUEST_URI             $request_uri;
        fastcgi_param   DOCUMENT_URI            $document_uri;
        fastcgi_param   DOCUMENT_ROOT           $realpath_root;
        fastcgi_param   SERVER_PROTOCOL         $server_protocol;

        fastcgi_param   GATEWAY_INTERFACE       CGI/1.1;
        fastcgi_param   SERVER_SOFTWARE         nginx/$nginx_version;

        fastcgi_param   REMOTE_ADDR             $remote_addr;
        fastcgi_param   REMOTE_PORT             $remote_port;
        fastcgi_param   SERVER_ADDR             $server_addr;
        fastcgi_param   SERVER_PORT             $server_port;
        fastcgi_param   SERVER_NAME             $server_name;

        fastcgi_param   HTTPS                   $https;

        # PHP only, required if PHP was built with --enable-force-cgi-redirect
        fastcgi_param   REDIRECT_STATUS         200;

        fastcgi_index index.php;
        fastcgi_read_timeout 3000;
    }

    if ($request_uri ~ (?i)^/([^?]*)\.php($|\?)) {
        return 301 /$1$is_args$args;
    }
    rewrite ^/index$ / permanent;
    rewrite ^/(.*)/$ /$1 permanent;
}

Currently I have the above example live at https://www.websemantica.co.uk.

Try adding a few values in the form. It should display Success! in blue every time. Sometimes is shows Failure! in red. It may take quite a few page refreshes to change from Success! to Failure! or vice-versa. Eventually, it will show Success! every time, therefore there must be some sort of caching problem.

like image 450
Dan Bray Avatar asked Nov 05 '19 13:11

Dan Bray


4 Answers

This is the desired behavior of PHP you can see this here because PHP uses realpath_cache to stores the file paths due to performance enhancements so that it can reduce Disk Operations.

In order to avoid this behavior maybe you can try to clear the realpath_cache before using the get_file_contents function

You can try something like this:


clearstatcache();
$data = file_get_contents("Your File");

You can read more for clearstatcache on PHP doc.

like image 182
Touqeer Shafi Avatar answered Oct 23 '22 08:10

Touqeer Shafi


It's too much depend on OS level. So how about try to think out the box. How about try to read the real location of file by readlink, and use that real location path ?

$realPath = shell_exec("readlink " . $yourSymlink);
$fileContent = file_get_contents($realPath);
like image 36
Vo Kim Nguyen Avatar answered Oct 23 '22 07:10

Vo Kim Nguyen


There are two caches.

First the OS cache and then the PHP cache.

In most of the cases clearstatcache(true) before file_get_contents(...) does the job.

But sometimes you also need to clear the OS cache. In case of Linux, there I can think of two places to clear. PageCache (1) and dentries/inodes (2).

This clears both:

shell_exec('echo 3 > /proc/sys/vm/drop_caches')

Note: This is good for troubleshooting but not for frequent calls in production as it clears the whole OS cache and costs the system a few moments of cache re-population.

like image 2
Bahram Ardalan Avatar answered Oct 23 '22 09:10

Bahram Ardalan


"The problem is after deleting and recreating a symlink"

How do you delete the symlink? Deleting a file (or a symlink) should automatically clear the cache.

Otherwise, you could see what happens if you do:

// This has "race condition" written all around it
unlink($link);
touch($link);
unlink($link); // Remove the empty file
symlink($target, $link);

If this does not solve the problem, could it perhaps be a problem with nginx as in this issue?

Try logging all operations to a log file, to see what actually happens.

or maybe...

...could you do without symlinks altogether? For example, store in a database, memcache, SQLite file, or even a JSON file the mapping between "filename" and "actual symlink target". Using e.g. redis or other keystores, you could associate the "filename" with the real symlink target and bypass the OS resolution completely.

Depending on the use case, this might even turn out to be faster than using symlinks.

like image 2
LSerni Avatar answered Oct 23 '22 09:10

LSerni