Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Java store primitive types in RAM? [duplicate]


This is NOT about whether primitives go to the stack or heap, it's about where they get saved in the actual physical RAM.


Take a simple example:

int a = 5;

I know 5 gets stored into a memory block.

My area of interest is where does the variable 'a' get stored?

Related Sub-questions: Where does it happen where 'a' gets associated to the memory block that contains the primitive value of 5? Is there another memory block created to hold 'a'? But that will seem as though a is a pointer to an object, but it's a primitive type involved here.

like image 956
Question Everything Avatar asked Dec 30 '13 03:12

Question Everything


People also ask

How are primitive data types stored in memory Java?

Primitive Data Types. The eight primitives defined in Java are int, byte, short, long, float, double, boolean and char. These aren't considered objects and represent raw values. They're stored directly on the stack (check out this article for more information about memory management in Java).

How are primitives stored in memory?

Thus, you have seen that all primitive data types are stored on the stack, and in the case of reference type, stack holds a pointer to the object on the heap.

Where are primitive types stored in memory?

Stack memory stores primitive types and the addresses of objects. The object values are stored in heap memory.

How does Java handle primitive data types?

However, Java provides support for character strings using the String class of Java. lang package. String class has some special support from the Java Programming language, so, technically it is a primitive data type. While using String class, a character string will automatically create a new String Object.


1 Answers

To expound on Do Java primitives go on the Stack or the Heap? -

Lets say you have a function foo():

void foo() {
   int a = 5;
   system.out.println(a);
}

Then when the compiler compiles that function, it'll create bytecode instructions that leave 4 bytes of room on the stack whenever that function is called. The name 'a' is only useful to you - to the compiler, it just creates a spot for it, remembers where that spot is, and everywhere where it wants to use the value of 'a' it instead inserts references to the memory location it reserved for that value.

If you're not sure how the stack works, it works like this: every program has at least one thread, and every thread has exactly one stack. The stack is a continuous block of memory (that can also grow if needed). Initially the stack is empty, until the first function in your program is called. Then, when your function is called, your function allocates room on the stack for itself, for all of its local variables, for its return types etc.

When your function main call another function foo, here's one example of what could happen (there are a couple simplifying white lies here):

  • main wants to pass parameters to foo. It pushes those values onto the top of the stack in such a way that foo will know exactly where they will be put (main and foo will pass parameters in a consistent way).
  • main pushes the address of where program execution should return to after foo is done. This increments the stack pointer.
  • main calls foo.
  • When foo starts, it sees that the stack is currently at address X
  • foo wants to allocate 3 int variables on the stack, so it needs 12 bytes.
  • foo will use X + 0 for the first int, X + 4 for the second int, X + 8 for the third.
    • The compiler can compute this at compile time, and the compiler can rely on the value of the stack pointer register (ESP on x86 system), and so the assembly code it writes out does stuff like "store 0 in the address ESP + 0", "store 1 into the address ESP + 4" etc.
  • The parameters that main pushed on the stack before calling foo can also be accessed by foo by computing some offset from the stack pointer.
    • foo knows how many parameters it takes (say 3) so it knows that, say, X - 8 is the first one, X - 12 is the second one, and X - 16 is the third one.
  • So now that foo has room on the stack to do its work, it does so and finishes
  • Right before main called foo, main wrote its return address on the stack before incrementing the stack pointer.
  • foo looks up the address to return to - say that address is stored at ESP - 4 - foo looks at that spot on the stack, finds the return address there, and jumps to the return address.
  • Now the rest of the code in main continues to run and we've made a full round trip.

Note that each time a function is called, it can do whatever it wants with the memory pointed to by the current stack pointer and everything after it. Each time a function makes room on the stack for itself, it increments the stack pointer before calling other functions to make sure that everybody knows where they can use the stack for themselves.

I know this explanation blurs the line between x86 and java a little bit, but I hope it helps to illustrate how the hardware actually works.

Now, this only covers 'the stack'. The stack exists for each thread in the program and captures the state of the chain of function calls between each function running on that thread. However, a program can have several threads, and so each thread has its own independent stack.

What happens when two function calls want to deal with the same piece of memory, regardless of what thread they're on or where they are in the stack?

This is where the heap comes in. Typically (but not always) one program has exactly one heap. The heap is called a heap because, well, it's just a big ol heap of memory.

To use memory in the heap, you have to call allocation routines - routines that find unused space and give it to you, and routines that let you return space you allocated but are no longer using. The memory allocator gets big pages of memory from the operating system, and then hands out individual little bits to whatever needs it. It keeps track of what the OS has given to it, and out of that, what it has given out to the rest of the program. When the program asks for heap memory, it looks for the smallest chunk of memory that it has available that fits the need, marks that chunk as being allocated, and hands it back to the rest of the program. If it doesn't have any more free chunks, it could ask the operating system for more pages of memory and allocate out of there (up until some limit).

In languages like C, those memory allocation routines I mentioned are usually called malloc() to ask for memory and free() to return it.

Java on the other hand doesn't have explicit memory management like C does, instead it has a garbage collector - you allocate whatever memory you want, and then when you're done, you just stop using it. The Java runtime environment will keep track of what memory you've allocated, and will scan your program to find out if you're not using all of your allocations any more and will automatically deallocate those chunks.

So now that we know that memory is allocated on the heap or the stack, what happens when I create a private variable in a class?

public class Test {
     private int balance;
     ...
}

Where does that memory come from? The answer is the heap. You have some code that creates a new Test object - Test myTest = new Test(). Calling the java new operator causes a new instance of Test to be allocated on the heap. Your variable myTest stores the address to that allocation. balance is then just some offset from that address - probably 0 actually.

The answer at the very bottom is all just .. accounting.

...

The white lies I spoke about? Let's address a few of those.

  • Java is first a computer model - when you compile your program to bytecode, you're compiling to a completely made-up computer architecture that doesn't have registers or assembly instructions like any other common CPU - Java, and .Net, and a few others, use a stack-based processor virtual machine, instead of a register-based machine (like x86 processors). The reason is that stack based processors are easier to reason about, and so its easier to build tools that manipulate that code, which is especially important to build tools that compile that code to machine code that will actually run on common processors.

  • The stack pointer for a given thread typically starts at some very high address and then grows down, instead of up, at least on most x86 computers. That said, since that's a machine detail, it's not actually Java's problem to worry about (Java has its own made-up machine model to worry about, its the Just In Time compiler's job to worry about translating that to your actual CPU).

  • I mentioned briefly how parameters are passed between functions, saying stuff like "parameter A is stored at ESP - 8, parameter B is stored at ESP - 12" etc. This generally called the "calling convention", and there are more than a few of them. On x86-32, registers are sparse, and so many calling conventions pass all parameters on the stack. This has some tradeoffs, particularly that accessing those parameters might mean a trip to ram (though cache might mitigate that). x86-64 has a lot more named registers, which means that the most common calling conventions pass the first few parameters in registers, which presumably improves speed. Additionally, since the Java JIT is the only guy that generates machine code for the entire process (excepting native calls), it can choose to pass parameters using any convention it wants.

  • I mentioned how when you declare a variable in some function, the memory for that variable comes from the stack - that's not always true, and it's really up to the whims of the environment's runtime to decide where to get that memory from. In C#/DotNet's case, the memory for that variable could come from the heap if the variable is used as part of a closure - this is called "heap promotion". Most languages deal with closures by creating hidden classes. So what often happens is that the method local members that are involved in closures are rewritten to be members of some hidden class, and when that method is invoked, instead allocate a new instance of that class on the heap and stores its address on the stack; and now all references to that originally-local variable occur instead through that heap reference.

like image 139
antiduh Avatar answered Oct 11 '22 10:10

antiduh