Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concurrent appends to the same file using Perl

I have a need to upgrade a Perl CGI script where the users must complete 3 steps. After they finish each step, the script is logging which step the user completed. Having a record of this is important so we can prove to the user that they only finished step one and didn't complete all three steps, for example.

Right now, the script is creating 1 log file for each instance of the CGI script. So if UserA does step 1, then UserB does step 1, then step 2, then step 3 - and then UserA finishes step 2 and step 3, the order of the log files would be.

LogFile.UserA.Step1
LogFile.UserB.Step1
LogFile.UserB.Step2
LogFile.UserB.Step3
LogFile.UserA.Step2
LogFile.UserA.Step3

The log files are named with the current timestamp, a random number, and the process PID.

This works fine to prevent the same file from getting written to more than once, but the directory quickly gets thousands of files (each file contains just a few bytes in it). There is a process to rotate and compress these logs, but it has fallen upon me to make it so the script logs to just one file a day to reduce the number of log files being created.

Basically, the log file will have the current date in the file name, and anytime the CGI script needs to write to the log, it will append to the one log file for that day, regardless of the user or what step they are on.

Nothing will need to be reading the log file - the only thing that will happen to it is an append by the CGI script. The log rotation will run on log files that are 7 days or older.

My question is, what is the best way to handle the concurrent appends to this log file? Do I need to lock it before appending? I found this page on Perl Monks that seems to indicate that "when multiple processes are writing to the same file, and all of them have the file opened for appending, data shall not be overwritten."

I've learned that just because it can be done doesn't mean that I should, but in this case, what is the safest, best practice way to do this?

Summary:

  • Concurrent appends to the same file
  • Each append to the file is just one line, less than 50 characters
  • Order does not matter

Thanks!

like image 770
BrianH Avatar asked Mar 02 '10 18:03

BrianH


2 Answers

Yes, use flock.

An example program is below, beginning with typical front matter:

#! /usr/bin/perl

use warnings;
use strict;

use Fcntl qw/ :flock /;

Then we specify the path to the log and the number of clients that will run:

my $log = "/tmp/my.log";
my $clients = 10;

To log a message, open the file in append mode so all writes automatically go at the end. Then call flock to wait our turn on having exclusive access to the log. Once we're up, write the message and close the handle, which automatically releases the lock.

sub log_step {
  my($msg) = @_;

  open my $fh, ">>", $log or die  "$0 [$$]: open: $!";
  flock $fh, LOCK_EX      or die  "$0 [$$]: flock: $!";
  print $fh "$msg\n"      or die  "$0 [$$]: write: $!";
  close $fh               or warn "$0 [$$]: close: $!";
}

Now fork off $clients child processes to go through all three steps with random intervals between:

my %kids;
my $id = "A";
for (1 .. $clients) {
  my $pid = fork;
  die "$0: fork: $!" unless defined $pid;

  if ($pid) {
    ++$kids{$pid};
    print "$0: forked $pid\n";
  }
  else {
    my $user = "User" . $id;
    log_step "$user: Step 1";
    sleep rand 3;
    log_step "$user: Step 2";
    sleep rand 3;
    log_step "$user: Step 3";
    exit 0;
  }

  ++$id;
}

Don't forget to wait on all the children to exit:

print "$0: reaping children...\n";
while (keys %kids) {
  my $pid = waitpid -1, 0;
  last if $pid == -1;

  warn "$0: unexpected kid $pid" unless $kids{$pid};
  delete $kids{$pid};
}

warn "$0: still running: ", join(", " => keys %kids), "\n"
  if keys %kids;

print "$0: done!\n", `cat $log`;

Sample output:

[...]
./prog.pl: reaping children...
./prog.pl: done!
UserA: Step 1
UserB: Step 1
UserC: Step 1
UserC: Step 2
UserC: Step 3
UserD: Step 1
UserE: Step 1
UserF: Step 1
UserG: Step 1
UserH: Step 1
UserI: Step 1
UserJ: Step 1
UserD: Step 2
UserD: Step 3
UserF: Step 2
UserG: Step 2
UserH: Step 2
UserI: Step 2
UserI: Step 3
UserB: Step 2
UserA: Step 2
UserA: Step 3
UserE: Step 2
UserF: Step 3
UserG: Step 3
UserJ: Step 2
UserJ: Step 3
UserE: Step 3
UserH: Step 3
UserB: Step 3

Keep in mind that the order will be different from run to run.

like image 165
Greg Bacon Avatar answered Nov 07 '22 04:11

Greg Bacon


"when multiple processes are writing to the same file, and all of them have the file opened for appending, data shall not be overwritten" may be true, but that doesn't mean your data can't come out mangled (one entry inside another). It's not very likely to happen for small amounts of data, but it might.

flock is a reliable and reasonably simple solution to that problem. I would advise you to simply use that.

like image 25
Leon Timmermans Avatar answered Nov 07 '22 04:11

Leon Timmermans