Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

vectorized strlen getting away with reading unallocated memory

While studying OSX 10.9.4's implementation of strlen, I notice that it always compares a chunk of 16-bytes and skips ahead to the following 16-bytes until it encounters a '\0'. The relevant part:

3de0:   48 83 c7 10             add    $0x10,%rdi
3de4:   66 0f ef c0             pxor   %xmm0,%xmm0
3de8:   66 0f 74 07             pcmpeqb (%rdi),%xmm0
3dec:   66 0f d7 f0             pmovmskb %xmm0,%esi
3df0:   85 f6                   test   %esi,%esi
3df2:   74 ec                   je     3de0 <__platform_strlen+0x40>

0x10 is 16 bytes in hex.

When I saw that, I was wondering: this memory could just as well not be allocated. If I had allocated a C string of 20 bytes and passed it to strlen, it would read 36 bytes of memory. Why is it allowed to do that? I started looking and found How dangerous is it to access an array out of bounds?

Which confirmed that it's definitely not always a good thing, unallocated memory might be unmapped, for example. Yet, there must be something that makes this work. Some of my hypotheses:

  • OSX not only guarantees that its allocations are 16-byte aligned, but also that the "quantum" of an allocated is a 16-byte chunks. Said another way, allocating 5 bytes will actually allocate 16 bytes. Allocating 20 bytes will actually allocate 32 bytes.
  • It's not harmful per se to read of the end of an array when you're writing asm, as it's not undefined behaviour, as long as its within bounds (within a page?).

What's the actual reason?

EDIT: just found Why I'm getting read and write permission on unallocated memory?, which seems to indicate my first guess was right.

EDIT 2: Stupidly enough, I had forgotten that even though Apple seems to have removed the source of most of its asm implementations (Where did OSX's x86-64 assembly libc routines go?), it left strlen: http://www.opensource.apple.com/source/Libc/Libc-997.90.3/x86_64/string/strlen.s

In the comments we find:

//  returns the length of the string s (i.e. the distance in bytes from
//  s to the first NUL byte following s).  We look for NUL bytes using
//  pcmpeqb on 16-byte aligned blocks.  Although this may read past the
//  end of the string, because all access is aligned, it will never
//  read past the end of the string across a page boundary, or even
//  accross a cacheline.

EDIT: I honestly think all answerers deserved an accepted answer, and basically all contained the information necessary to understand the issue. So I went for the answer of the person that had the least reputation.

like image 536
Aktau Avatar asked Aug 29 '14 10:08

Aktau


4 Answers

I'm the author of the routine in question.

As some others have said, the key thing is that the reads are all aligned. While reading outside the bounds of an array is undefined behavior in C, we're not writing C; we know lots of details of the x86 architecture beyond what the C abstract machine defines.

In particular, reads beyond the end of a buffer are safe (meaning they cannot produce a trap or other observable side effect) so long as they do not cross a page boundary (because memory attributes and mappings are tracked at page granularity). Since the smallest supported page size is 4096 bytes, an aligned 16 byte load cannot cross a page boundary.

like image 179
Stephen Canon Avatar answered Jan 02 '23 01:01

Stephen Canon


Reading memory on most architectures only has a side effect if the address being read corresponds to a page that is not mapped. Most strlen implementations for modern computers try to do only aligned reads of however-many bytes. They will never do a 16-byte read straddling two pages, and so they will never elicit any side effect. So it's cool.

like image 25
tmyklebu Avatar answered Jan 02 '23 00:01

tmyklebu


How malloc aligns stuff is irrelevant, since the programmer may allocate a string inside a block. A simple example is a struct that has an embedded char array:

struct Foo
{
    int bar;
    char baz[10];
};

If you allocate an instance of this struct, it will take up 16 bytes, but baz will start at offset 4. Thus if you read 16 bytes from there, you will cross into the next 16 byte chunk that you don't own. If you are unlucky, that may lie in the next page and trigger a fault.

Also, strings don't have to be in the heap at all, such as constants that are in read-only data section. strlen must work in all cases.

I assume the strlen function first processes the initial portion of the string until it's 16 byte aligned (this code has been omitted from the question) and then proceeds in 16 byte chunks. As such, the actual reason this works is reason #2: you won't cross a page boundary which is the granularity of access checking by the processor.

like image 20
Jester Avatar answered Jan 02 '23 00:01

Jester


Allocating Small Memory Blocks Using Malloc

...

When allocating any small blocks of memory, remember that the granularity for blocks allocated by the malloc library is 16 bytes. Thus, the smallest block of memory you can allocate is 16 bytes and any blocks larger than that are a multiple of 16. For example, if you call malloc and ask for 4 bytes, it returns a block whose size is 16 bytes; if you request 24 bytes, it returns a block whose size is 32 bytes. Because of this granularity, you should design your data structures carefully and try to make them multiples of 16 bytes whenever possible.

For the record, referencing (reading) past allocation could trigger a page fault if you cross a page boundary. This is how guardmalloc works:

Each malloc allocation is placed on its own virtual memory page (or pages). By default, the returned address for the allocation is positioned such that the end of the allocated buffer is at the end of the last page, and the next page after that is kept unallocated. Thus, accesses beyond the end of the buffer cause a bad access error immediately.

Also read the vectorized instructions explicit reference in the same page:

As of Mac OS X 10.5, libgmalloc aligns the start of allocated buffers on 16-byte boundaries by default, to allow proper use of vector instructions (e.g., SSE). (The use of vector instructions is common, including in some Mac OS X system libraries.

ps. afaik NSObject and friends share the heap implementation with malloc

like image 40
Remus Rusanu Avatar answered Jan 02 '23 02:01

Remus Rusanu