I have a question about memory allocation strategies in a specific situation.
I have a large number of structs that are coming from a JSON API via serde. There is a specific field in the JSON response which will, for a small minority of cases, exist and contain an array of exactly 11 elements. In the majority of cases this field will not exist at all in the response. All responses need to be stored.
The majority of responses will be of the format:
{
id: 1,
event: "None"
}
A minority of responses will be of the format:
{
id: 2,
event: "something",
event_details : [11 json objects here]
}
If I define my struct for parsing the JSON as:
#[derive(Deserialize, Debug)]
struct Event {
id: u32,
event: String,
// EventDetail is a struct with an u32 and string field
events: Option<[EventDetail; 11]>
}
I can see using deepsizeof crate that my Vec<Event> that gets created takes up ~500MB of memory. If I change the events field to be events: Option<Box<[EventDetail; 11]>> the memory usage drops to ~150MB.
My understanding here is that since Box is allocated at runtime instead of compile time, when a JSON response comes in that is lacking the events field, then the 11 item array does not need to be allocated. Am I correct here? And is this a valid use case for Box or is there a better way to do this?
I also tried events: Option<Vec<EventDetail>>, which resulted in slightly higher memory assignment than Box. I assume this is down to padding.
Yes, this is all correct. Option<[T; N]> always allocates space for the array, even if it's not used, whereas Option<Box<...>> allocates enough space for the box pointer, but not the thing it points to unless needed. Hiding the large data behind a Box indirection is 100% the correct call here.
Vec is a pointer (basically a Box) plus the size of the vector and its capacity, so Option<Vec<...>> will usually be three times the overhead, as it will allocate space for all of the pointer, the size and the capacity, of the (potentially nonexistent) vector. But at that point, we're arguing in units of bytes, so it's silly to debate size differences at that scale.
(Vec also stores the memory allocator, but the default allocator Global is a zero-sized type)
Since you know the size in advance, I'd go with Option<Box<[EventDetail; 11]>> rather than the vector, since the vector is just pointlessly pushing data to runtime that could be known at compile-time.
Just a partial answer for the "or is there a better way to do this?" part.
Depending on the context of usage of the struct Event, another option is to store the event_details arrays externally to the struct. Depending on your performance constraints and how you resolve identity for Events, you could store them in another Vec or a dictionary (HashMap?), indexing by Event.id.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With