Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I persist a large Perl object for re-use between runs?

I've got a large XML file, which takes over 40 seconds to parse with XML::Simple.

I'd like to be able to cache the resulting parsed object so that on the next run I can just retrieve the parsed object and not reparse the whole file.

I've looked at using Data::Dumper but the documentation is a bit lacking on how to store and retrieve its output from disk files. Other classes I've looked at (e.g. Cache::Cache appear designed for storage of many small objects, not a single large one.

Can anyone recommend a module designed for this?

EDIT. The XML file is ftp://ftp.rfc-editor.org/in-notes/rfc-index.xml, and I went with Storable for speeding up subsequent runs. Changing the XML parser would have required very significant code changes.

On my Mac Pro benchmark figures for reading the entire file with XML::Simple vs Storable are:

      s/iter  test1  test2
test1   47.8     --  -100%
test2  0.148 32185%     --
like image 235
Alnitak Avatar asked Dec 06 '22 02:12

Alnitak


2 Answers

Data::Dumper is actually VERY simple. If your object is a hashref $HashRef:

# Write
open(FILE, ">your_filename") || die "Can not open: $!";
print FILE Data::Dumper->Dump([$HashRef],["HashRef"]);
close(FILE) || die "Error closing file: $!";

# Read
my $HashRef;
$HashRef = eval { do "your_filename" };
   # Might need "no strict;" before and "use strict;" after "do"
die "Error reading: $@" if $@;
# Now $HashRef is what it was before writing

Another good option is using Storable. From POD:

use Storable;
store \%table, 'file';
$hashref = retrieve('file');

For a very good guide on various options (as well as a better example of Data::Dumper usage) see Chapter 14 "Persistence" of brian d foy's "Mastering Perl" book

like image 105
DVK Avatar answered Dec 21 '22 22:12

DVK


Storable. That's the lazy answer. (Prefer nstore over store.)

The opposite of data dumping is eval.

The good answer is: You really want to learn to use an XML module suitable for heavy processing such as XML::Twig or XML::LibXML to speed up parsing, so you do not need this caching monkey code.

like image 27
daxim Avatar answered Dec 21 '22 23:12

daxim