Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sharing heap memory with fork()

I am working on implementing a database server in C that will handle requests from multiple clients. I am using fork() to handle connections for individual clients.

The server stores data in the heap, which consists of a root pointer to hash tables of dynamically allocated records. The records are structs that have pointers to various data-types. I would like for the processes to be able to share this data so that, when a client makes a change to the heap, the changes will be visible for the other clients.

I have learned that fork() uses COW (Copy On Write), and my understanding is that it copies the heap (and stack) memory of the parent process when the child tries to modify the data in memory.

I have found out that I can use the shm library to share memory.

Would the code below be a valid way to share heap memory (in shared_string)? If a child were to use similar code (i.e. starting from //start), would other children be able to read/write to it while the child is running and after it's dead?

key_t key;
int shmid;

key = ftok("/tmp",'R');
shmid = shmget(key, 1024, 0644 | IPC_CREAT);

//start
char * string;
string = malloc(sizeof(char) * 10);

strcpy(string, "a string");

char * shared_string;

shared_string = shmat(shmid, string, 0);

strcpy(shared_string, string);

Here are some of my thoughts/concerns regarding this:

  • I'm thinking about sharing the root pointer of the database. I'm not sure if that would work or if I have to mark all allocated memory as shared.

  • I'm not sure if the parent / other children are able to access memory allocated by a child.

  • I'm not sure if a child's allocated memory stays on the heap after it is killed, or if that memory is released.

like image 389
phantombit Avatar asked Apr 01 '12 03:04

phantombit


3 Answers

First of all, fork is completely inappropriate for what you're trying to achieve. Even if you can make it work, it's a horrible hack. In general, fork only works for very simplistic programs anyway, and I would go so far as to say that fork should never be used except followed quickly by exec, but that's aside from the point here. You really should be using threads.

With that said, the only way to have memory that's shared between the parent and child after fork, and where the same pointers are valid in both, is to mmap (or shmat, but that's a lot fuglier) a file or anonymous map with MAP_SHARED prior to the fork. You cannot create new shared memory like this after fork because there's no guarantee that it will get mapped at the same address range in both.

Just don't use fork. It's not the right tool for the job.

like image 159
R.. GitHub STOP HELPING ICE Avatar answered Sep 25 '22 05:09

R.. GitHub STOP HELPING ICE


I think you are basically looking to do what is done by Redis (and probably others). They describe it in http://redis.io/topics/persistence (search for "copy-on-write").

  • threads defeat the purpose
  • classic shared memory (shm, mapped memory) also defeats the purpose

The primary benefit to using this method is avoidance of locking, which can be a pain to get right.

As far as I understand it the idea of using COW is to:

  • fork when you want to write, not in advance
  • the child (re)writes the data to disk, then immediately exits
  • the parent keeps on doing its work, and detects (SIGCHLD) when the child exited. If while doing its work the parent ends up making changes to the hash, the kernel will execute a copy for the affected blocks (right terminology?).
    A "dirty flag" is used to track if a new fork is needed to execute a new write.

Things to watch out for:

  • Make sure only one outstanding child
  • Transactional safety: write to a temp file first, then move it over so that you always have a complete copy, maybe keeping the previous around if the move is not atomic.
  • test if you will have issues with other resources that get duplicated (file descriptors, global destructors in c++)

You may want to take gander at the redis code as well

like image 41
nhed Avatar answered Sep 23 '22 05:09

nhed


I'm thinking about sharing the root pointer of the database. I'm not sure if that would work or if I have to mark all allocated memory as shared.

Each process will have its own private memory range. Copy-on-write is a kernel-space optimization that is transparent to user space.

As others have said, SHM or mmap'd files are the only way to share memory between separate processes.

like image 25
Wil Cooley Avatar answered Sep 25 '22 05:09

Wil Cooley