Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why remark phase is needed on concurrent GC

Concurrent GC needs remark phase. The role of remark phase is to mark modified objects during concurrent mark phase. But I think if we only mark the newly created objects during concurrent mark phase, there's no need to execute remark phase.

remark phase is needed because of the modified objects. The modification can be two type. One is new object creation and the other is modified pointer to another object. New object problem can be solved easily if we mark the newly created objects. And modified pointer to another object is not a problem in fact. Because

Dead object can not revive

Dead object means that no one could point that object. How can they revive? So modified pointer should point to already marked objects. It means there's no need to perform remark.

Someone could say that, "Marking new object on its creation is too expensive. So they cannot be marked during concurrent mark phase and that's the reason why remark phase is needed". It seems like reasonable. But this can arise another question. How could remark without traverse every objects from the root? If remark phase should traverse every objects from the root, the works done by concurrent mark phase is useless. Or if remark phase traverse only modified objects, the information that which object is modified should be saved somewhere. I think it could be much expensive than just marking .

Am I wrong? It should be wrong. But I have no idea which point is wrong.

like image 870
Joffrey Avatar asked Apr 24 '15 11:04

Joffrey


People also ask

How does concurrent mark sweep GC work?

After the remark pause, a concurrent sweeping phase collects the objects identified as unreachable. After a collection cycle completes, the CMS collector waits, consuming almost no computational resources, until the start of the next major collection cycle.

When should the concurrent low pause collector be used?

2) When to Use the Concurrent Low Pause Collector in java Concurrent Low Pause Collector should be used when your application can afford to share processor resources with the garbage collector while the application is running in java.

What is concurrent mark sweep generation?

Concurrent Mark-Sweep refers to the Garbage Collection alogorithm that is being used, in this case, to collect against the "old" heap. The heap is generally in 3 generations.

What is difference between concurrent mark sweep and G1 garbage collector?

Thus, with each garbage collection, G1 continuously works to reduce fragmentation. This is beyond the capability of both of the previous methods. CMS (Concurrent Mark Sweep) garbage collection does not do compaction. Parallel compaction performs only whole-heap compaction, which results in considerable pause times.


1 Answers

And modified pointer to another object is not a problem in fact. Because

Dead object can not revive

They really can't but do you know which objects are dead? No! Why?

You don't know it after the initial mark phase as you look only at the thread stacks and don't follow references.

You don't know if after the concurrent mark phase as the following may happen:

  • A thread reads the field a.x and stores its value in its register (or on its stack or elsewhere).
  • Then this thread set a.x = null (or something else).
  • The GC comes and sees null there.
  • Then the thread restores a.x to its previous value.

Now, the GC has missed the object a.x points to. While the above scenario is not very common, it may happen and there are more realistic (and more complicated) scenarios.

So it's necessary to look at the modified memory again, which is the remark phase. Fortunately, not the whole memory must be scanned again, as a card table gets used.


I'm afraid this (otherwise nice) explanation is a bit misleading in this point:

The remark phase is a stop-the-world. CMS cannot correctly determine which objects are alive (mark them live), if the application is running concurrently and keeps changing what is live.

The threads do change what is live, but they also change what you can see as being live. And that's the problem.

This article states it rather clearly:

Part of the work in the remark phase involves rescanning objects that have been changed by an application thread (i.e., looking at the object A to see if A has been changed by the application thread so that A now references another object B and B was not previously marked as live).

I'd say: When you search one room after another, you may miss your glasses when children move them around.

A note concerning the scenario

I'm pretty sure, the above scenario is possible, it's just not exactly what a program usually does. For a pretty realistic example, consider

void swap(Object[] a, int i, int j) {
    Object tmp = a[i];
    a[i] = a[j];
    // Now the original reference a[i] is in a register only.
    a[j] = tmp;
}
like image 77
maaartinus Avatar answered Oct 15 '22 14:10

maaartinus