I need to use a big file that contains String,String pairs and because I want to ship it with a JAR, I opted to include a serialized and gzipped version in the resource folder of the application. This is how I created the serialization:
ObjectOutputStream out = new ObjectOutputStream(
new BufferedOutputStream(new GZIPOutputStream(new FileOutputStream(OUT_FILE_PATH, false))));
out.writeObject(map);
out.close();
I chose to use a HashMap<String,String>
, the resulting file is 60MB and the map contains about 4 million entries.
Now when I need the map and I deserialize it using:
final InputStream in = FileUtils.getResource("map.ser.gz");
final ObjectInputStream ois = new ObjectInputStream(new BufferedInputStream(new GZIPInputStream(in)));
map = (Map<String, String>) ois.readObject();
ois.close();
this takes about 10~15 seconds. Is there a better way to store such a big map in a JAR? I ask because I also use the Stanford CoreNLP library which uses big model files itself but seems to perform better in that regard. I tried to locate the code where the model files are read but gave up.
Your problem is you zipped the data. Store it plain text.
The performance hit is most probably in unzipping the stream. Jars are already zipped, so there's no space saving storing the file zipped.
Basically:
Files.lines(Paths.get("myfilenane.txt"))
to stream the linesSomething like this, assuming data is in form key=value
(like a Properties file):
Map<String, String> map = new HashMap<>();
Files.lines(Paths.get("myfilenane.txt"))
.map(s -> s.split("="))
.forEach(a -> map.put(a[0], a[1]));
Disclaimer: Code may not compile or work as it was thumbed in on my phone (but there's a reasonable chance it will work)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With