Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

File size vs. in memory size in Java

Tags:

java

c++

memory

If I take an XML file that is around 2kB on disk and load the contents as a String into memory in Java and then measure the object size it's around 33kB.

Why the huge increase in size?
If I do the same thing in C++ the resulting string object in memory is much closer to the 2kB.

To measure the memory in Java I'm using Instrumentation. For C++, I take the length of the serialized object (e.g string).

like image 920
imrichardcole Avatar asked May 24 '13 06:05

imrichardcole


People also ask

Does serialization reduce size?

In some cases, the secondary intention of data serialization is to minimize the data's size which then reduces disk space or bandwidth requirements.

How do you find the file size?

Right-click the file and click Properties. The image below shows that you can determine the size of the file or files you have highlighted from in the file properties window. In this example, the chrome. jpg file is 18.5 KB (19,032 bytes), and that the size on disk is 20.0 KB (20,480 bytes).

How much memory does an object takes in Java?

In a modern 64-bit JDK, an object has a 12-byte header, padded to a multiple of 8 bytes, so the minimum object size is 16 bytes. For 32-bit JVMs, the overhead is 8 bytes, padded to a multiple of 4 bytes.


2 Answers

Assuming that your XML file contains mainly ASCII characters and uses an encoding that represents them as single bytes, then you can espect the in memory size to be at least double, since Java uses UTF-16 internally (I've heard of some JVMs that try to optimize this, thouhg). Added to that will be overhead for 2 objects (the String instance and an internal char array) with some fields, IIRC about 40 bytes overall.

So your "object size" of 33kb is definitely not correct, unless you're using a weird JVM. There must be some problem with the method you use to measure it.

like image 32
Michael Borgwardt Avatar answered Oct 08 '22 16:10

Michael Borgwardt


I think there are multiple factors involved. First of all, as Bruce Martin said, objects in java have an overhead of 16 bytes per object, c++ does not. Second, Strings in Java might be 2 Bytes per character instead of 1. Third, it could be that Java reserves more Memory for its Strings than the C++ std::string does.

Please note that these are just ideas where the big difference might come from.

like image 189
Marius Avatar answered Oct 08 '22 16:10

Marius