I've got a large XML file, which takes over 40 seconds to parse with XML::Simple.
I'd like to be able to cache the resulting parsed object so that on the next run I can just retrieve the parsed object and not reparse the whole file.
I've looked at using Data::Dumper but the documentation is a bit lacking on how to store and retrieve its output from disk files. Other classes I've looked at (e.g. Cache::Cache appear designed for storage of many small objects, not a single large one.
Can anyone recommend a module designed for this?
EDIT. The XML file is ftp://ftp.rfc-editor.org/in-notes/rfc-index.xml, and I went with Storable for speeding up subsequent runs. Changing the XML parser would have required very significant code changes.
On my Mac Pro benchmark figures for reading the entire file with XML::Simple vs Storable are:
      s/iter  test1  test2
test1   47.8     --  -100%
test2  0.148 32185%     --
Data::Dumper is actually VERY simple. If your object is a hashref $HashRef:
# Write
open(FILE, ">your_filename") || die "Can not open: $!";
print FILE Data::Dumper->Dump([$HashRef],["HashRef"]);
close(FILE) || die "Error closing file: $!";
# Read
my $HashRef;
$HashRef = eval { do "your_filename" };
   # Might need "no strict;" before and "use strict;" after "do"
die "Error reading: $@" if $@;
# Now $HashRef is what it was before writing
Another good option is using Storable. From POD:
use Storable;
store \%table, 'file';
$hashref = retrieve('file');
For a very good guide on various options (as well as a better example of Data::Dumper usage) see Chapter 14 "Persistence" of brian d foy's "Mastering Perl" book
Storable. That's the lazy answer. (Prefer nstore over store.)
The opposite of data dumping is eval.
The good answer is: You really want to learn to use an XML module suitable for heavy processing such as XML::Twig or XML::LibXML to speed up parsing, so you do not need this caching monkey code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With