How does Serialized RDD occupy less space in memory?

Question

In Spark Programming guide, Serializing RDD is mentioned as one of Techniques to to decrease memory usage. As per my understanding Serialization is the conversion of an object to bytes, so that the object can be easily saved to storage.So how does it occupy less space?

Vikram Patil · Accepted Answer

With Spark version 2.x.x, as it mentioned in the memory tuning document, Java objects have overhead over raw data such as a pointer to class, collections using wrapper objects or boxed objects for collections of primitive types. These overheads are not stored when objects are serialized.

But since data is stored as a serialized byte array in the partition, it will need to be deserialized for usage and it may be time-consuming.

https://spark.apache.org/docs/latest/tuning.html

How does Serialized RDD occupy less space in memory?

Tags:

java

serialization

apache-spark

user2017

1 Answers

Vikram Patil

Recent Activity

Donate For Us

How does Serialized RDD occupy less space in memory?

Tags:

java

serialization

apache-spark

user2017

1 Answers

Vikram Patil

Related questions

Recent Activity

Donate For Us