Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get copy-on-write to work on shared memory on linux

I tried to write a small application to get familiar with the concept of copy-on-write in user space. I've read through the answer by MSalters and figured that it would only work if I started with a mmap'ed file to store my data in. As I don't need file based persistency, I tried to do the same thing with shared memory. First I mmap'ed and initialized a shm fd, then I mapped a second copy with MAP_PRIVATE and read from it again. However, just reading from it causes the kernel to copy the whole thing, taking considerably more time and eating up twice the memory. Why does it not do COW?

Here's the program I came up with to illustrate the behavior:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <assert.h>

static const size_t ARRAYSIZE = 1UL<<30;

void init(int* A)
{
    for (size_t i = 0; i < ARRAYSIZE; ++i)
        A[i] = i;
}

size_t agg(const int* A)
{
    size_t sum = 0;
    for (size_t i = 0; i < ARRAYSIZE; ++i)
        sum += A[i];
    return sum;
}

int main()
{
    assert(sizeof(int) == 4);
    shm_unlink("/cowtest");
    printf("ARRAYSIZE: %lu\n", ARRAYSIZE);
    int fd = shm_open("/cowtest", O_RDWR | O_CREAT | O_TRUNC, 0);
    if (fd == -1)
    {
        perror("Error allocating fd\n");
        return 1;
    }
    if (ftruncate(fd, sizeof(int) * ARRAYSIZE) == -1)
    {
        perror("Error ftruncate\n");
        return 1;
    }
    /* Open shm */
    int* A= (int*)mmap(NULL, sizeof(int) * ARRAYSIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    if (A == (int*)-1)
    {
        perror("Error mapping A to memory\n");
        return 1;
    }
    init(A);

    /* Create cow copy */
    int* Acopy = (int*)mmap(NULL, sizeof(int) * ARRAYSIZE, PROT_READ, MAP_PRIVATE, fd, 0);
    if (Acopy == (int*)-1)
    {
        printf("Error mapping copy from file\n");
        return 1;
    }

    /* Aggregate over A */
    size_t sumA = agg(A);
    size_t expected = (ARRAYSIZE * (ARRAYSIZE - 1)) >> 1;
    assert(expected == sumA);

    /* Aggregate over Acopy */
    size_t sumCopy = agg(Acopy);
    assert(expected == sumCopy);


    shm_unlink("/cowtest");
    printf("Enter to exit\n");
    getchar();
    return 0;
}

I compiled it with g++ -O3 -mtune=native -march=native -o shm-min shm-min.cpp -lrt.

The array it creates contains 4GB of integer values. Right before terminating the program however allocates 8GB of shared memory, and in /proc/<pid>/smaps you can see that it actually did a full copy during the read only operation. I have no idea why it does that. Is this a kernel bug? Or am I missing something?

Thanks a lot for any insights. Lars

Edit Here's the relevant content of /proc/<pid>/smaps on Ubuntu 14.04 (3.13.0-24):

7f3b9b4ae000-7f3c9b4ae000 r--p 00000000 00:14 168154                     /run/shm/cowtest (deleted)
Size:            4194304 kB
Rss:             4194304 kB
Pss:             2097152 kB
Shared_Clean:          0 kB
Shared_Dirty:    4194304 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:      4194304 kB
Anonymous:             0 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd mr mw me sd
7f3c9b4ae000-7f3d9b4ae000 rw-s 00000000 00:14 168154                     /run/shm/cowtest (deleted)
Size:            4194304 kB
Rss:             4194304 kB
Pss:             2097152 kB
Shared_Clean:          0 kB
Shared_Dirty:    4194304 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:      4194304 kB
Anonymous:             0 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr sh mr mw me ms sd
like image 235
lekv Avatar asked Jun 25 '14 12:06

lekv


People also ask

How do you implement a copy-on-write?

To implement copy-on-write, a smart pointer to the real content is used to encapsulate the object's value, and on each modification an object reference count is checked; if the object is referenced more than once, a copy of the content is created before modification.

Is MMAP copy-on-write?

As it turns out, there is an operating system facility that enables this: mmap() 's copy-on-write functionality. In this article you will learn: How normal memory copies work.

What is the copy-on-write policy and how can it be used?

Copy-on-write or CoW is a technique to efficiently copy data resources in a computer system. If a unit of data is copied but not modified, the "copy" can exist as a reference to the original data. Only when the copied data is modified is a copy created, and new bytes are actually written.

What are the benefits and drawbacks of copy-on-write?

The major advantage of copy-on-write is that it's incredibly space efficient because the reserved snapshot storage only has to be large enough to capture the data that's changed. But the well-known downside to copy-on-write snapshot is that it will reduce performance on the original volume.


1 Answers

There was no copying. The smaps file has a hint:

Size:            4194304 kB
Rss:             4194304 kB
Pss:             2097152 kB

See how Pss is half the real size of the mapped area? That's because it is divided by two usages (Pss = proportional shared size). That is, you have the same file mapped twice to different ranges of virtual memory, but the underlying physical pages are the same for both mappings.

To figure out physical addresses of the relevant pages you can use a tool here. Save it as page-types.c, run make page-types and then ./page-types -p <pid> -l -N. You will see that different virtual addresses (in the first column) map to the same physical pages (in the second column).

If you add PROT_WRITE permission bit for the second mapping, and call init(Acopy), you will see that Pss jumps to 4GB, and the physical addresses of the corresponding pages are no longer the same.

TL;DR COW works.

like image 113
2 revs Avatar answered Nov 14 '22 22:11

2 revs