Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Serialized RDD occupy less space in memory?

In Spark Programming guide, Serializing RDD is mentioned as one of Techniques to to decrease memory usage. As per my understanding Serialization is the conversion of an object to bytes, so that the object can be easily saved to storage.So how does it occupy less space?

like image 285
user2017 Avatar asked Jan 03 '18 01:01

user2017


1 Answers

With Spark version 2.x.x, as it mentioned in the memory tuning document, Java objects have overhead over raw data such as a pointer to class, collections using wrapper objects or boxed objects for collections of primitive types. These overheads are not stored when objects are serialized.

But since data is stored as a serialized byte array in the partition, it will need to be deserialized for usage and it may be time-consuming.

https://spark.apache.org/docs/latest/tuning.html

like image 160
Vikram Patil Avatar answered Sep 28 '22 17:09

Vikram Patil