I'm gettint a large amount of data from a database query and I'm making objects of them. I finally have a list of these objects (about 1M of them) and I want to serialize that to disk for later use. Problem is that it barely fits in memory and won't fit in the future, so I need some system to serialize say the first 100k, the next 100k etc; and also to read the data back in in 100k increments.
I could make some obvious code that checks if the list gets too big and then wirites it to file 'list1', then 'list2' etc but maybe there's a better way to handle this?
Serialization is the process of converting a data object—a combination of code and data represented within a region of data storage—into a series of bytes that saves the state of the object in an easily transmittable form.
Examples of sensitive data that should never be serialized include cryptographic keys, digital certificates, and classes that may hold references to sensitive data at the time of serialization. This rule is meant to prevent the unintentional serialization of sensitive information.
Serialization refers to the process of converting a data object (e.g., Python objects, Tensorflow models) into a format that allows us to store or transmit the data and then recreate the object when needed using the reverse process of deserialization.
In some cases, the secondary intention of data serialization is to minimize the data's size which then reduces disk space or bandwidth requirements.
You could go through the list, create an object, and then feed it immediately to an ObjectOutputStream which writes them to the file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With