I have a CIFS share from Windows Server 2012 R2 mounted on Ubuntu 14.04.2 LTS (kernel 3.13.0-61-generic) like this
/etc/fstab
//10.1.2.3/Share /Share cifs credentials=/root/.smbcredentials/share_user,user=share_user,dirmode=0770,filemode=0660,uid=4000,gid=5000,forceuid,forcegid,noserverino,cache=none 0 0
The gid=5000
corresponds to group www-data
which runs a PHP process.
The files are mounted correctly when I check via the console logged in as the www-data
user - they are readable and removable (the operations that are used by the PHP script).
The PHP script is processing about 50-70 000 files per day. The files are created on the host Windows machine and some time later the PHP script running on the Linux machine is notified about a new file, checks if the file exists (file_exists
), reads it and deletes. Usually all works fine, but sometimes (a few hundreds to 1-2 000 per day) the PHP script raises an error that the file does not exist. That should never be the case, since it is notified only of actually existing files.
When I manually check those files reported as not existing, they are correctly accessible on the Ubuntu machine and have a creation date from before the PHP script checked their existence.
Then I trigger the PHP script manually to pick up that file and it is picked up without problems.
What I already tried
There are multiple similar questions, but I seem to have exhausted all the advices:
clearstatcache()
before checking file_exists($f)
file_exists($f)
is an absolute path with no special characters - the file paths are always of format /Share/11/222/333.zip
(with various digits)noserverino
share mount parametercache=none
share mount parameter/proc/fs/cifs/Stats/
displays as below, but I don't know if there is anything suspicious here. The share in question is 2) \\10.1.2.3\Share
Resources in use
CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Operations (MIDs): 0
6 session 2 share reconnects
Total vfs operations: 133925492 maximum at one time: 11
1) \\10.1.2.3\Share_Archive
SMBs: 53824700 Oplocks breaks: 12
Reads: 699 Bytes: 42507881
Writes: 49175075 Bytes: 801182924574
Flushes: 0
Locks: 12 HardLinks: 0 Symlinks: 0
Opens: 539845 Closes: 539844 Deletes: 156848
Posix Opens: 0 Posix Mkdirs: 0
Mkdirs: 133 Rmdirs: 0
Renames: 0 T2 Renames 0
FindFirst: 21 FNext 28 FClose 0
2) \\10.1.2.3\Share
SMBs: 50466376 Oplocks breaks: 1082284
Reads: 39430299 Bytes: 2255596161939
Writes: 2602 Bytes: 42507782
Flushes: 0
Locks: 1082284 HardLinks: 0 Symlinks: 0
Opens: 2705841 Closes: 2705841 Deletes: 539832
Posix Opens: 0 Posix Mkdirs: 0
Mkdirs: 0 Rmdirs: 0
Renames: 0 T2 Renames 0
FindFirst: 227401 FNext 1422 FClose 0
One pattern I think I see is that the error is raised only if the file in question has been already processed (read and deleted) earlier by the PHP script. There are many files that have been correctly processed and then processed again later, but I have never seen that error for a file that is processed for the first time. The time between re-processing varies from 1 to about 20 days. For re-processing, the file is simply recreated under the same path on the Windows host with updated content.
What can be the problem? How can I investigate better? How can I determine if the problem lies on the PHP or OS side?
Update
I have moved the software that produces the files to a Ubuntu VM that mounts the same shares the same way. This component is coded in Java. I am not seeing any issues when reading/writing to the files.
Update - PHP details
The exact PHP code is:
$strFile = zipPath($intApplicationNumber);
clearstatcache();
if(!file_exists($strFile)){
return responseInternalError('ZIP file does not exist', $strFile);
}
The intApplicationNumber
is a request parameter (eg. 12345678
) which is simply transformed to a path by the zipPath()
function (eg. \Share\12\345\678.zip
- always a full path).
The script may be invoked concurrently with different application numbers, but will not be invoked concurrently with the same application number.
If the script fails (returns the 'ZIP file does not exist'
error), it will be called again a minute later. If that fails, it will be permanently marked as failed. Then, usually more than an hour later, I can call the script manually with the same invocation (GET request) that it's done on production and it works fine, the file is found and sent in the response:
public static function ResponseRaw($strFile){
ob_end_clean();
self::ReadFileChunked($strFile, false);
exit;
}
protected static function ReadFileChunked($strFile, $blnReturnBytes=true) {
$intChunkSize = 1048576; // 1M
$strBuffer = '';
$intCount = 0;
$fh = fopen($strFile, 'rb');
if($fh === false){
return false;
}
while(!feof($fh)){
$strBuffer = fread($fh, $intChunkSize);
echo $strBuffer;
if($blnReturnBytes){
$intCount += strlen($strBuffer);
}
}
$blnStatus = fclose($fh);
if($blnReturnBytes && $blnStatus){
return $intCount;
}
return $blnStatus;
}
After the client receives the file, he notifies the PHP server that the file can be moved to an archive location (by means of copy()
and unlink()
). That part works fine.
STRACE result
After several days of no errors, the error reappeared. I ran strace
and it reports
access("/Share/11/222/333.zip", F_OK) = -1 ENOENT (No such file or directory)
for some files that do exist when I run ls /Share/11/222/333.zip
from the command line. Therefore the problem is on the OS level, PHP is not to be blamed.
The errors started appearing when the load on the disk on the host increased (due to other processes), so @risyasin's suggestion below seems most likely - it's a matter of busy resources/timeouts.
I'll try @miguel-svq's advice of skipping the existence test and just going for fopen()
right away and handling the error then. I'll see if it changes anything.
You can try to use the directio option to avoid doing inode data caching on files opened on this mount:
//10.1.2.3/Share /Share cifs credentials=/root/.smbcredentials/share_user,user=share_user,dirmode=0770,filemode=0660,uid=4000,gid=5000,forceuid,forcegid,noserverino,cache=none,directio 0 0
This is hardly a definitive answer to my problem, rather a summary of what I found out and what I settled with.
At the bottom of the problem lies that it is the OS who reports that the file does not exist. Running strace
shows occasionally
access("/Share/11/222/333.zip", F_OK) = -1 ENOENT (No such file or directory)
for the files that do exist (and show up when listed with ls
).
The Windows share host was sometimes under heavy disk load. What I did is move one of the shares to a different host so that the load is spread now between the two. Also, the general load on the system is a bit lighter lately. Whenever I get the error about file not existing, I retry the request some time later and it's no longer there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With