I have a number of scripts that currently read in a lot of data from some .CSV files. For efficiency, I use the Text::CSV_XS module to read them in and then create a hash using one of the columns as an index. However, I have a lot of files and they are quite large. And each of the scripts needs to read in the data all over again.
The question is: How can I have persistent storage of these Perl hashes so that all them can be read back in with a minimum of CPU?
Combining the scripts is not an option. I wish...
I applied the 2nd rule of optimization and used profiling to find that the vast majority of the CPU (about 90%) was in:
Text::CSV_XS::fields
Text::CSV_XS::Parse
Text::CSV_XS::parse
So, I made a test script that read in all the .CSV files (Text::CSV_XS), dumped them using the Storable module, and then went back and read them back in using the Storable module. I profiled this so I could see the CPU times:
$ c:/perl/bin/dprofpp.bat
Total Elapsed Time = 1809.397 Seconds
User+System Time = 950.5560 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
25.6 243.6 243.66 126 1.9338 1.9338 Storable::pretrieve
20.5 194.9 194.92 893448 0.0002 0.0002 Text::CSV_XS::fields
9.49 90.19 90.198 893448 0.0001 0.0001 Text::CSV_XS::Parse
7.48 71.07 71.072 126 0.5641 0.5641 Storable::pstore
4.45 42.32 132.52 893448 0.0000 0.0001 Text::CSV_XS::parse
(the rest was in terms of 0.07% or less and can be ignored)
So, using Storable costs about 25.6% to load back in as compared to Text::CSV_XS at about 35%. Not a lot of savings...
Has anybody got a suggestion on how I can read in these data more efficiently?
Thanks for your help.
The easiest way to put a very large hash on disk, IMHO, is with BerkeleyDB. It's fast, time-tested and rock-solid, and the CPAN module provides a tied API. That means you can continue using your hash as if it were an in-memory data structure, but it will automatically read and write through BerkeleyDB to disk.
Parse the data once and put it in an SQLite db. Query using DBI.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With