I have a website where at the same moment, there can be multiple users writing to the same file at the same time. An example of my code is below.
PHP 5.6.20
<?php
$root=realpath($_SERVER["DOCUMENT_ROOT"]);
$var=time()."|";
echo $var."<br/>";
$filename=$root."/Testing/Testing/test1.txt";
$myfile=fopen($filename,"a");
fwrite($myfile,$var);
fclose($myfile);
$myfile=fopen($filename,"r");
$contents = fread($myfile, filesize($filename));
echo $contents;
fclose($myfile);
?>
I read that in PHP if multiple users are trying to write to the same file at the same file, there is a chance of data corruption, I of course don't want to have a code that may cause some data corruption over the long run.
I tried to run the above code at almost the same time on my browser to simulate multiple users writing to the same file at the same time, and it produced no errors or any data corruption to the file that is writing to, but I'm still not sure about future data corruptions.
I read that I can use flock to make sure that 2 users can not write to the file at the same time, but since I tested the above code and it produced no data corruption, I'm not sure if I should update my code with flock or just leave it as it is.
My questions are:
1) Is there any chance of the above code corrupting the file that it's writing to?
2) if yes, is using flock will solve this issue? if yes, how should I implement flock in the above code ?
Edit:
I know that this uncertainty can be solved by using a database, but for this case, it's better to use a plain text, so please don't suggest me to use a DB.
thanks in advance.
If two scripts attempt to write to a file at the same time. The fopen() function, when called on a file, does not stop that same file from being opened by another script, which means you might find one script reading from a file as another is writing, or, worse, two scripts writing to the same file simultaneously. So it is good to use flock() . You can get more help on http://www.hackingwithphp.com/8/11/0/locking-files-with-flock . For your code you may use flock() as
<?php
$root=realpath($_SERVER["DOCUMENT_ROOT"]);
$var=time()."|";
echo $var."<br/>";
$filename=$root."/Testing/Testing/test1.txt";
$myfile=fopen($filename,"a");
if (flock($myfile, LOCK_EX)) {        
    fwrite($myfile,$var);
    flock($myfile, LOCK_UN); // unlock the file
} else {
    // flock() returned false, no lock obtained
    print "Could not lock $filename!\n";
}
fclose($myfile);
$myfile=fopen($filename,"r");
if (flock($myfile, LOCK_EX)) {        
    $contents = fread($myfile, filesize($filename));
    echo $contents;
    flock($myfile, LOCK_UN); // unlock the file
} else {
    // flock() returned false, no lock obtained
    print "Could not lock $filename!\n";
}
fclose($myfile);
?>
This is a common theoretical problem for many web applications in many programming languages.
The answer is yes, it CAN cause trouble. However, this is a very theoretical problem, if the contents you are adding to the files aren't very big and if you don't have heavy traffic. Today's operating systems and file systems are so well optimized (Caching, lazy writing etc.) that it is very unlikely to happen, when you close your file handles immediately after using them.
You could add something like a buffer, if you run into an error (check access rights before writing with PHP/catch exceptions in other languages) and try again after some delay or write your buffer to a temp file and merge it with another process - you have several possibilities.
And yes, flock() is a function that could be good for these purposes, but I think, that would be already over-engineered.
I'm not familiar with flock, but my first thought is a locking or queueing mechanism. If the data written does not have to be used or view back to the users, then a work queue would be the best choice. Write the data to a redis or memcached based system, sql or another type of queueing system, or just dump unique timestamped files with the interesting content in a directory that a worker can aggregate in ascending order into the maste file.
For use cases where the written data triggers something the user need to get a report or result of, respond to, or other feed back, then locking might be the way if you can't re-architect to an async stack with eventual consistency. It's also difficult to know with out knowing the load and number of concurrent users, the number of servers etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With