I was reading a blog on 64-bit Firefox edition on hacks.mozilla.org.
The author states:
For
asm.js
code, the increased address space also lets us use hardware memory protection to safely remove bounds checks fromasm.js
heap accesses. The gains are pretty dramatic: 8%-17% on the asmjs-apps-*-throughput tests as reported on arewefastyet.com.
I was trying to understand how 64-bit hardware have automatic bounds check (assuming compiler does with hardware support) for C/C++. I could not find any answers in SO. I found one technical paper on this subject, but I am not able to grasp how this is done.
Can someone explain 64-bit hardware aids in bounds check?
Array bound checking refers to determining whether all array references in a program are within their declared ranges. This checking is critical for software verification and validation because subscripting arrays beyond their declared sizes may produce unexpected results, security holes, or failures.
By reading out-of-bounds memory, an attacker might be able to get secret values, such as memory addresses, which can be bypass protection mechanisms such as ASLR in order to improve the reliability and likelihood of exploiting a separate weakness to achieve code execution instead of just denial of service.
Most modern CPUs implement virtual addressing/virtual memory - when a program references a particular address, that address is virtual; the mapping to a physical page, if any, is implemented by the CPU's MMU (memory management unit). The CPU translates every virtual address to a physical address by looking it up in the page table the OS set up for the current process. These lookups are cached by the TLB, so most of the time there's no extra delay. (In some non-x86 CPU designs, TLB misses are handled in software by the OS.)
So my program accesses address 0x8050, which is in virtual page 8 (assuming the standard 4096 byte (0x1000) page size). The CPU sees that virtual page 8 is mapped to physical page 200, and so performs a read at physical address 200 * 4096 + 0x50 == 0xC8050
.
What happens when the CPU does not have a TLB mapping for that virtual address? Such a thing occurs frequently because the TLB is of limited size. The answer is that the CPU generates a page fault, which is handled by the OS.
Several outcomes can occur as a result of a page fault:
The relevant case is number 3. When a segfault happens, the default behavior of the operating system is to abort the process and do things like write out a core file. However, a process is allowed to trap its own segfaults and attempt to handle them, perhaps even without stopping. This is where things get interesting.
We can use this to our advantage to perform 'hardware accelerated' index checks, but there are a few more stumbling blocks we hit trying to do so.
First, the general idea: for every array, we put it in its own virtual memory region, with all of the pages that contain the array data being mapped as usual. On either side of the real array data, we create virtual page mappings that are unreadable and unwritable. If you attempt to read outside of the array, you'll generate a page fault. The compiler inserts its own page fault handler when it made the program, and it handles the page fault, turning it into an index-out-of-bounds exception.
Stumbling block number one is that we can only mark whole pages as being readable or not. Array sizes may not be an even multiple of a page size, so we have a problem - we can't put fences exactly before and after the end of the array. The best we can do is leave a small gap either before the beginning of the array or after the end of the array between the array and the nearest 'fence' page.
How do they get around this? Well, in Java's case, it's not easy to compile code that performs negative indexing; and if it does, it doesn't matter anyway because the negative index is treated like it's unsigned, which puts the index far ahead of the beginning of the array, which means that it's very likely to hit unmapped memory and will cause a fault anyway.
So what they do is to align the array so that the end of the array butts up right against the end of a page, like so ('-' means unmapped, '+' means mapped):
-----------++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------
| Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | ...
|----------------array---------------------------|
Now, if the index is past end of the array, it'll hit page 7, which is unmapped, which will cause a page fault, which will turn into an index out of bounds exception. If the index is before the beginning of the array (that is, it's negative), then because it's treated as an unsigned value, it'll become very large and positive, putting us far past page 7 again causing an unmapped memory read, causing a page fault, which will again turn into an index out of bounds exception.
Stumbling block number 2 is that we really should leave a lot of unmapped virtual memory past the end of the array before we map the next object, otherwise, if an index was out of bounds, but far, far, far out of bounds, it might hit a valid page and not cause an index-out-of-bounds exception, and instead would read or write arbitrary memory.
In order to solve this, we just use huge amounts of virtual memory - we put each array into its own 4 GiB region of memory, of which only the first N few pages are actually mapped. We can do this because we're just using address space here, not actual physical memory. A 64 bit process has ~4 billion chunks of 4 GiB regions of memory, so we have plenty of address space to work with before we run out. On a 32-bit CPU or process, we have very little address space to play around with, so this technique isn't very feasible. As it is, many 32-bit programs today are running out of virtual address space just trying to access real memory, nevermind trying to map empty 'fence' pages in that space to try to use as 'hardware accelerated' index range checks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With