Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing a persistent perl script

I am trying to write a persistent/cached script. The code would look something like this:

...
Memoize('process_fille');
print process_file($ARGV[0]);
...
sub process_file{
    my $filename = shift;
    my ($a, $b, $c) = extract_values_from_file($filename);
    if (exists $my_hash{$a}{$b}{$c}){
        return $my_hash{$a}{$b}{$c};
    }
    return $default;
}

Which would be called from a shell script in a loop as follows

value=`perl my_script.pl`;

Is there a way I could call this script in such a way that it will keep its state. from call to call. Lets assume that both initializing '%my_hash' and calling extract_values_from_file is an expensive operation.

Thanks

like image 446
Smartelf Avatar asked Apr 27 '12 14:04

Smartelf


3 Answers

This is kind of dark magic, but you can store state after your script's __DATA__ token and persist it.

use Data::Dumper; # or JSON, YAML, or any other data serializer
package MyPackage;
my $DATA_ptr;
our $state;
INIT {
    $DATA_ptr = tell DATA;
    $state = eval join "", <DATA>;
}

...
manipulate $MyPackage::state in this and other scripts
...

END {
    open DATA, '+<', $0;   # $0 is the name of this script
    seek DATA, $DATA_ptr, 0;
    print DATA Data::Dumper::Dumper($state);
    truncate DATA, tell DATA;  # in case new data is shorter than old data
    close DATA;
}
__DATA__
$VAR1 = {
    'foo' => 123,
    'bar' => 42,
    ...
}

In the INIT block, store the position of the beginning of your file's __DATA__ section and deserialize your state. In the END block, you reserialize the current state and overwrite the __DATA__ section of your script. Of course, the user running the script needs to have write permission on the script.

Edited to use INIT block instead of BEGIN block -- the DATA block is not set up during the compile phase.

like image 155
mob Avatar answered Oct 20 '22 23:10

mob


If %my_hash in your example have moderate size in its final initialized state, you can simply use one of serialization modules like Storable, JSON::XS or Data::Dumper to keep your data in pre-assembled form between runs. Generate a new file when it is absent and just reload ready content from there when it is present.

Also, you've mentioned that you would call this script in loops. A good strategy would be to not call script right away inside the loop, but build a queue of arguments instead and then pass all of them to script after the loop in single execution. Script would set up its environment and then loop over arguments doing its easy work without need to redo setup steps for each of them.

like image 30
Oleg V. Volkov Avatar answered Oct 20 '22 23:10

Oleg V. Volkov


You can't get the script to keep state. As soon as the process exists any information not written to disk is gone.

There are a few ways you can accomplish this though:

  • Write a daemon which listens on a network or unix socket. The daemon can populate my_hash and answer questions sent from a very simple my_script.pl. It'd only have to open a connection to the daemon, send the question and return an answer.

  • Create an efficient look-up file format. If you need the information often it'll probably stay in the VFS cache anyway.

  • Set up a shared memory region. The first time your scripts starts you save the information there, then re-use it later. That might be tricky from a Perl script though.

like image 27
Kristof Provost Avatar answered Oct 20 '22 23:10

Kristof Provost