Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Serializing a very large list

I'm gettint a large amount of data from a database query and I'm making objects of them. I finally have a list of these objects (about 1M of them) and I want to serialize that to disk for later use. Problem is that it barely fits in memory and won't fit in the future, so I need some system to serialize say the first 100k, the next 100k etc; and also to read the data back in in 100k increments.

I could make some obvious code that checks if the list gets too big and then wirites it to file 'list1', then 'list2' etc but maybe there's a better way to handle this?

like image 921
kresjer Avatar asked Sep 23 '09 08:09

kresjer


People also ask

What is meant by serializing?

Serialization is the process of converting a data object—a combination of code and data represented within a region of data storage—into a series of bytes that saves the state of the object in an easily transmittable form.

What should never be serialized?

Examples of sensitive data that should never be serialized include cryptographic keys, digital certificates, and classes that may hold references to sensitive data at the time of serialization. This rule is meant to prevent the unintentional serialization of sensitive information.

What does serializing a model mean?

Serialization refers to the process of converting a data object (e.g., Python objects, Tensorflow models) into a format that allows us to store or transmit the data and then recreate the object when needed using the reverse process of deserialization.

Does serialization reduce size?

In some cases, the secondary intention of data serialization is to minimize the data's size which then reduces disk space or bandwidth requirements.


1 Answers

You could go through the list, create an object, and then feed it immediately to an ObjectOutputStream which writes them to the file.

like image 79
Zed Avatar answered Sep 30 '22 19:09

Zed