Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you efficiently debug reference count problems in shared memory?

Assume you have a reference counted object in shared memory. The reference count represents the number of processes using the object, and processes are responsible for incrementing and decrementing the count via atomic instructions, so the reference count itself is in shared memory as well (it could be a field of the object, or the object could contain a pointer to the count, I'm open to suggestions if they assist with solving this problem). Occasionally, a process will have a bug that prevents it from decrementing the count. How do you make it as easy as possible to figure out which process is not decrementing the count?

One solution I've thought of is giving each process a UID (maybe their PID). Then when processes decrement, they push their UID onto a linked list stored alongside the reference count (I chose a linked list because you can atomically append to head with CAS). When you want to debug, you have a special process that looks at the linked lists of the objects still alive in shared memory, and whichever apps' UIDs are not in the list are the ones that have yet to decrement the count.

The disadvantage to this solution is that it has O(N) memory usage where N is the number of processes. If the number of processes using the shared memory area is large, and you have a large number of objects, this quickly becomes very expensive. I suspect there might be a halfway solution where with partial fixed size information you could assist debugging by somehow being able to narrow down the list of possible processes even if you couldn't pinpoint a single one. Or if you could just detect which process hasn't decremented when only a single process hasn't (i.e. unable to handle detection of 2 or more processes failing to decrement the count) that would probably still be a big help.

(There are more 'human' solutions to this problem, like making sure all applications use the same library to access the shared memory region, but if the shared area is treated as a binary interface and not all processes are going to be applications written by you that's out of your control. Also, even if all apps use the same library, one app might have a bug outside the library corrupting memory in such a way that it's prevented from decrementing the count. Yes I'm using an unsafe language like C/C++ ;)

Edit: In single process situations, you will have control, so you can use RAII (in C++).

like image 829
Joseph Garvin Avatar asked Feb 08 '10 07:02

Joseph Garvin


People also ask

How do I debug shared memory?

Answering the question: use Debug + Windows + Memory + Memory 1 and put "pBuf" in the Address box. You'll see a raw hex dump of the shared memory content.

How can we maintain the reference count of a pointer?

In computer science, reference counting is a programming technique of storing the number of references, pointers, or handles to a resource, such as an object, a block of memory, disk space, and others. In garbage collection algorithms, reference counts may be used to deallocate objects that are no longer needed.

Is Shared_ptr reference counting?

It is a reference counting ownership model i.e. it maintains the reference count of its contained pointer in cooperation with all copies of the std::shared_ptr. So, the counter is incremented each time a new pointer points to the resource and decremented when destructor of the object is called.

What is reference count in python?

Reference counting is a simple technique in which objects are deallocated when there is no reference to them in a program. Every variable in Python is a reference (a pointer) to an object and not the actual value itself. For example, the assignment statement just adds a new reference to the right-hand side.


2 Answers

You could do this using only a single extra integer per object.

Initialise the integer to zero. When a process increments the reference count for the object, it XORs its PID into the integer:

object.tracker ^= self.pid;

When a process decrements the reference count, it does the same.

If the reference count is ever left at 1, then the tracker integer will be equal to the PID of the process that incremented it but didn't decrement it.


This works because XOR is commutative ( (A ^ B) ^ C == A ^ (B ^ C) ), so if a process XORs the tracker with its own PID an even number of times, it's the same as XORing it with PID ^ PID - that's zero, which leaves the tracker value unaffected.

You could alternatively use an unsigned value (which is defined to wrap rather than overflow) - adding the PID when incrementing the usage count and subtracting it when decrementing it.

like image 148
caf Avatar answered Nov 15 '22 09:11

caf


Fundementally, shared memory shared state is not a robust solution and I don't know of a way of making it robust.

Ultimately, if a process exits all its non-shared resources are cleaned up by the operating system. This is incidentally the big win from using processes (fork()) instead of threads.

However, shared resources are not. File handles that others have open are obviously not closed, and ... shared memory. Shared resources are only closed after the last process sharing them exits.

Imagine you have a list of PIDs in the shared memory. A process could scan this list looking for zombies, but then PIDs can get reused, or the app might have hung rather than crashed, or...

My recommendation is that you use pipes or other message passing primitives between each process (sometimes there is a natural master-slave relationship, other times all need to talk to all). Then you take advantage of the operating system closing these connections when a process dies, and so your peers get signalled in that event. Additionally you can use ping/pong timeout messages to determine if a peer has hung.

If, after profiling, it is too inefficient to send the actual data in these messages, you could use shared memory for the payload as long as you keep the control channel over some kind of stream that the operating system clears up.

like image 41
Will Avatar answered Nov 15 '22 08:11

Will