Update
I solved the problem and posted an answer. However, my solution isn't 100% ideal. I would much rather only remove the symlink
from the cache
with clearstatcache(true, $target)
or clearstatcache(true, $link)
but that doesn't work.
I would also much rather prevent the caching of symlinks in the first place or remove the symlink from the cache immediately after generating it. Unfortunately, I had no luck with that. For some reason clearstatcache(true)
after creating a symlink does not work, it still gets cached.
I will happily award the bounty to anyone that can improve my answer and solve those issues.
Edit
I've attempted to optimize my code by generating a file everytime clearstatcache
is run, so that I only need to clear the cache once for each symlink. For some reason, this does not work. clearstatcache
needs to be called every time a symlink
is including in the path, but why? There must be a way to optimize the solution I have.
I am using PHP 7.3.5
with nginx/1.16.0
. Sometimes file_get_contents
returns the wrong value when using a symlink
. The problem is after deleting and recreating a symlink, its old value remains in the cache. Sometimes the correct value is returned, sometimes the old value. It appears random.
I've tried to clear the cache or prevent caching with:
function symlink1($target, $link)
{
realpath_cache_size(0);
symlink($target, $link);
//clearstatcache(true);
}
I don't really want to disable caching but I still need 100% accuracy with file_get_contents.
Edit
I am unable to post my source code, as it is way too long and complex, so I have created a minimal, reproducible example (index.php) that recreates the problem:
<h1>Symlink Problem</h1>
<?php
$dir = getcwd();
if (isset($_POST['clear-all']))
{
$nos = array_values(array_diff(scandir($dir.'/nos'), array('..', '.')));
foreach ($nos as $no)
{
unlink($dir.'/nos/'.$no.'/id.txt');
rmdir($dir.'/nos/'.$no);
}
foreach (array_values(array_diff(scandir($dir.'/ids'), array('..', '.'))) as $id)
unlink($dir.'/ids/'.$id);
}
if (!is_dir($dir.'/nos'))
mkdir($dir.'/nos');
if (!is_dir($dir.'/ids'))
mkdir($dir.'/ids');
if (isset($_POST['submit']) && !empty($_POST['id']) && ctype_digit($_POST['insert-after']) && ctype_alnum($_POST['id']))
{
$nos = array_values(array_diff(scandir($dir.'/nos'), array('..', '.')));
$total = count($nos);
if ($total <= 100)
{
for ($i = $total; $i >= $_POST['insert-after']; $i--)
{
$id = file_get_contents($dir.'/nos/'.$i.'/id.txt');
unlink($dir.'/ids/'.$id);
symlink($dir.'/nos/'.($i + 1), $dir.'/ids/'.$id);
rename($dir.'/nos/'.$i, $dir.'/nos/'.($i + 1));
}
echo '<br>';
mkdir($dir.'/nos/'.$_POST['insert-after']);
file_put_contents($dir.'/nos/'.$_POST['insert-after'].'/id.txt', $_POST['id']);
symlink($dir.'/nos/'.$_POST['insert-after'], $dir.'/ids/'.$_POST['id']);
}
}
$nos = array_values(array_diff(scandir($dir.'/nos'), array('..', '.')));
$total = count($nos) + 1;
echo '<h2>Ids from nos directory</h2>';
foreach ($nos as $no)
{
echo ($no + 1).':'.file_get_contents("$dir/nos/$no/id.txt").'<br>';
}
echo '<h2>Ids from using symlinks</h2>';
$ids = array_values(array_diff(scandir($dir.'/ids'), array('..', '.')));
if (count($ids) > 0)
{
$success = true;
foreach ($ids as $id)
{
$id1 = file_get_contents("$dir/ids/$id/id.txt");
echo $id.':'.$id1.'<br>';
if ($id !== $id1)
$success = false;
}
if ($success)
echo '<b><font color="blue">Success!</font></b><br>';
else
echo '<b><font color="red">Failure!</font></b><br>';
}
?>
<br>
<h2>Insert ID after</h2>
<form method="post" action="/">
<select name="insert-after">
<?php
for ($i = 0; $i < $total; $i++)
echo '<option value="'.$i.'">'.$i.'</option>';
?>
</select>
<input type="text" placeholder="ID" name="id"><br>
<input type="submit" name="submit" value="Insert"><br>
</form>
<h2>Clear all</h2>
<form method="post" action="/">
<input type="submit" name="clear-all" value="Clear All"><br>
</form>
<script>
if (window.history.replaceState)
{
window.history.replaceState( null, null, window.location.href );
}
</script>
It seemed very likely to be a problem with Nginx
configuration. Not having these lines can cause the problem:
fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
fastcgi_param DOCUMENT_ROOT $realpath_root;
Here is my Nginx
configuration (you can see I have included the above lines):
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name www.websemantica.co.uk;
root "/path/to/site/root";
index index.php;
location / {
try_files $uri $uri/ $uri.php$is_args$query_string;
}
location ~* \.php$ {
try_files $uri =404;
fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock;
fastcgi_param QUERY_STRING $query_string;
fastcgi_param REQUEST_METHOD $request_method;
fastcgi_param CONTENT_TYPE $content_type;
fastcgi_param CONTENT_LENGTH $content_length;
fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
fastcgi_param SCRIPT_NAME $fastcgi_script_name;
fastcgi_param PATH_INFO $fastcgi_path_info;
fastcgi_param PATH_TRANSLATED $realpath_root$fastcgi_path_info;
fastcgi_param REQUEST_URI $request_uri;
fastcgi_param DOCUMENT_URI $document_uri;
fastcgi_param DOCUMENT_ROOT $realpath_root;
fastcgi_param SERVER_PROTOCOL $server_protocol;
fastcgi_param GATEWAY_INTERFACE CGI/1.1;
fastcgi_param SERVER_SOFTWARE nginx/$nginx_version;
fastcgi_param REMOTE_ADDR $remote_addr;
fastcgi_param REMOTE_PORT $remote_port;
fastcgi_param SERVER_ADDR $server_addr;
fastcgi_param SERVER_PORT $server_port;
fastcgi_param SERVER_NAME $server_name;
fastcgi_param HTTPS $https;
# PHP only, required if PHP was built with --enable-force-cgi-redirect
fastcgi_param REDIRECT_STATUS 200;
fastcgi_index index.php;
fastcgi_read_timeout 3000;
}
if ($request_uri ~ (?i)^/([^?]*)\.php($|\?)) {
return 301 /$1$is_args$args;
}
rewrite ^/index$ / permanent;
rewrite ^/(.*)/$ /$1 permanent;
}
Currently I have the above example live at https://www.websemantica.co.uk.
Try adding a few values in the form. It should display Success!
in blue every time. Sometimes is shows Failure!
in red. It may take quite a few page refreshes to change from Success!
to Failure!
or vice-versa. Eventually, it will show Success!
every time, therefore there must be some sort of caching problem.
This is the desired behavior of PHP you can see this here because PHP uses realpath_cache
to stores the file paths due to performance enhancements so that it can reduce Disk Operations.
In order to avoid this behavior maybe you can try to clear the realpath_cache
before using the get_file_contents
function
You can try something like this:
clearstatcache();
$data = file_get_contents("Your File");
You can read more for clearstatcache on PHP doc.
It's too much depend on OS level. So how about try to think out the box. How about try to read the real location of file by readlink
, and use that real location path ?
$realPath = shell_exec("readlink " . $yourSymlink);
$fileContent = file_get_contents($realPath);
There are two caches.
First the OS cache and then the PHP cache.
In most of the cases clearstatcache(true)
before file_get_contents(...)
does the job.
But sometimes you also need to clear the OS cache. In case of Linux, there I can think of two places to clear. PageCache (1) and dentries/inodes (2).
This clears both:
shell_exec('echo 3 > /proc/sys/vm/drop_caches')
Note: This is good for troubleshooting but not for frequent calls in production as it clears the whole OS cache and costs the system a few moments of cache re-population.
"The problem is after deleting and recreating a symlink"
How do you delete the symlink? Deleting a file (or a symlink) should automatically clear the cache.
Otherwise, you could see what happens if you do:
// This has "race condition" written all around it
unlink($link);
touch($link);
unlink($link); // Remove the empty file
symlink($target, $link);
If this does not solve the problem, could it perhaps be a problem with nginx as in this issue?
Try logging all operations to a log file, to see what actually happens.
...could you do without symlinks altogether? For example, store in a database, memcache, SQLite file, or even a JSON file the mapping between "filename" and "actual symlink target". Using e.g. redis or other keystores, you could associate the "filename" with the real symlink target and bypass the OS resolution completely.
Depending on the use case, this might even turn out to be faster than using symlinks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With