When implementing precise garbage collection, there is always the issue of figuring out which words on the stack are pointers and which are other kinds of data such as integers or floating point numbers. Interpreted languages typically solve this problem by making everything a pointer; compilers for some languages such as Lisp typically solve it by using tag bits to distinguish between pointers and integers.
But what about JIT compilers for languages such as Java and C# that support full unboxed machine word integers and floating-point numbers? How do they tell which of the contents of the stack and CPU registers are pointers?
The bytecode for such languages always contains full type information. It is stored either in meta-data (e.g., for argument types) or implicit in the opcode (e.g., there may be different opcodes for adding an integer or a floating point number).
When optimising code the compiler can access this information and use it to improve optimisations. It also uses the information to generate meta data for the compiled code at specific GC safe points.
A GC safe point is a place in the code, where it is safe to interrupt a thread to schedule another thread or perform garbage collection. At GC safe points we have the necessary meta data available to find out which registers contain pointers and which don't. In the Hotspot JVM, for example, a loop always contains a read from a special location in memory. The result of that read is unused, but if the address that the instruction reads from is read-protected, a page fault occurs. This can be used to interrupt a thread at arbitrary points in time by simply setting that page to read-only. Once the thread is interrupted we look at the program counter and look up the meta data in, say, a hash table.
Other places that need to be GC safe points are allocation sites: an allocation may fail and cause GC to occur. You can reduce the number of safe points by allocating memory for multiple objects at once.
Edit: Note that using GC-safe points is only one of many options. Another option, as SK-logic mentioned, is to use separate stacks for pointers and non-pointers. It is then clear that all elements of one stack need to be traversed during GC but none of the others. You still have to be careful with pointers in registers, though. For example, whenever there is a live pointer in a register, the same pointer must also exist on the stack.
A third option is to use a shadow stack that contains a linked list of pointers to stack roots living on the real stack. For details see the paper "Accurate Garbage Collection in an Uncooperative Environment" by Fergus Henderson (PDF).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With