Java: storing a big map in resources

Question

I need to use a big file that contains String,String pairs and because I want to ship it with a JAR, I opted to include a serialized and gzipped version in the resource folder of the application. This is how I created the serialization:

ObjectOutputStream out = new ObjectOutputStream(
            new BufferedOutputStream(new GZIPOutputStream(new FileOutputStream(OUT_FILE_PATH, false))));
out.writeObject(map);
out.close();

I chose to use a HashMap<String,String>, the resulting file is 60MB and the map contains about 4 million entries.

Now when I need the map and I deserialize it using:

final InputStream in = FileUtils.getResource("map.ser.gz");
final ObjectInputStream ois = new ObjectInputStream(new BufferedInputStream(new GZIPInputStream(in)));
map = (Map<String, String>) ois.readObject();
ois.close();

this takes about 10~15 seconds. Is there a better way to store such a big map in a JAR? I ask because I also use the Stanford CoreNLP library which uses big model files itself but seems to perform better in that regard. I tried to locate the code where the model files are read but gave up.

Bohemian · Accepted Answer

Your problem is you zipped the data. Store it plain text.

The performance hit is most probably in unzipping the stream. Jars are already zipped, so there's no space saving storing the file zipped.

Basically:

Store the file in plain text
Use Files.lines(Paths.get("myfilenane.txt")) to stream the lines
Consume each line with minimal code

Something like this, assuming data is in form key=value (like a Properties file):

Map<String, String> map = new HashMap<>();
Files.lines(Paths.get("myfilenane.txt"))
  .map(s -> s.split("="))
  .forEach(a -> map.put(a[0], a[1]));

_{Disclaimer: Code may not compile or work as it was thumbed in on my phone (but there's a reasonable chance it will work)}

Java: storing a big map in resources

Tags:

java

dictionary

serialization

Eike Cochu

1 Answers

Bohemian

Recent Activity

Donate For Us

Java: storing a big map in resources

Tags:

java

dictionary

serialization

Eike Cochu

1 Answers

Bohemian

Related questions

Recent Activity

Donate For Us