I have a large hashmap containing millions of entries, and I want to persist it to disk, so that when it is read from the disk again, I don't have the overhead of inserting the key-value pairs back into the map again.
I am trying to use the cereal library to do this, but it appears that the HashMap datatype needs to derive Generic. Is there a way to do this?
Hashmap: A HashMap stores items in key/value pairs, and we can access them by an index of another type (such as a string). Now to serialize anything, you have to implement the java. io. Serializable interface and HashMap also implements the Serializable interface.
Serialization converts a Java object into a stream of bytes, which can be persisted or shared as needed. Java Maps are collections that map a key Object to a value Object, and are often the least intuitive objects to serialize.
Serialization is a mechanism of converting the state of an object into a byte stream. Deserialization is the reverse process where the byte stream is used to recreate the actual Java object in memory. This mechanism is used to persist the object. The byte stream created is platform independent.
Introduction of ObjectMapper Class jackson. databind package and can serialize and deserialize two types of objects: Plain Old Java Objects (POJOs)
You might be able to use stand-alone deriving to generate your own Generic
instance for HashMap
. You'll probably get a warning about orphan instances, but you also probably don't care :) Anyway, I haven't tried this, but it's probably worth a shot...
I am not sure if using Generics is a best shot at achieving high performance. My best bet would actually be writing your own instance for Serializable like this:
instance (Serializable a) => Serializable (HashMap a) where
...
To avoid creating orphan instances you can use newtype trick:
newtype SerializableHashMap a = SerializableHashMap { toHashMap :: HashMap a }
instance (Serializable a) => SerializableHashMap a where
...
The question is how to define ...
?
There is no definite answer before you actually try and implement and benchmark possible solutions.
One possible solution is to use toList
/fromList
functions and store/read the size of the HashMap
.
The other (which would be similar to using Generics) would be to write direct serialization based on internal HashMap structure. Given the fact that you dont really have the internals exported that would be a job for Generics only.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With