Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: storing a big map in resources

I need to use a big file that contains String,String pairs and because I want to ship it with a JAR, I opted to include a serialized and gzipped version in the resource folder of the application. This is how I created the serialization:

ObjectOutputStream out = new ObjectOutputStream(
            new BufferedOutputStream(new GZIPOutputStream(new FileOutputStream(OUT_FILE_PATH, false))));
out.writeObject(map);
out.close();

I chose to use a HashMap<String,String>, the resulting file is 60MB and the map contains about 4 million entries.

Now when I need the map and I deserialize it using:

final InputStream in = FileUtils.getResource("map.ser.gz");
final ObjectInputStream ois = new ObjectInputStream(new BufferedInputStream(new GZIPInputStream(in)));
map = (Map<String, String>) ois.readObject();
ois.close();

this takes about 10~15 seconds. Is there a better way to store such a big map in a JAR? I ask because I also use the Stanford CoreNLP library which uses big model files itself but seems to perform better in that regard. I tried to locate the code where the model files are read but gave up.

like image 380
Eike Cochu Avatar asked Nov 08 '22 13:11

Eike Cochu


1 Answers

Your problem is you zipped the data. Store it plain text.

The performance hit is most probably in unzipping the stream. Jars are already zipped, so there's no space saving storing the file zipped.

Basically:

  • Store the file in plain text
  • Use Files.lines(Paths.get("myfilenane.txt")) to stream the lines
  • Consume each line with minimal code

Something like this, assuming data is in form key=value (like a Properties file):

Map<String, String> map = new HashMap<>();
Files.lines(Paths.get("myfilenane.txt"))
  .map(s -> s.split("="))
  .forEach(a -> map.put(a[0], a[1]));

Disclaimer: Code may not compile or work as it was thumbed in on my phone (but there's a reasonable chance it will work)

like image 175
Bohemian Avatar answered Nov 14 '22 22:11

Bohemian