Why can't I mmap(MAP_FIXED) the highest virtual page in a 32-bit Linux process on a 64-bit kernel?

Tags:

While attempting to test Is it allowed to access memory that spans the zero boundary in x86? in user-space on Linux, I wrote a 32-bit test program that tries to map the low and high pages of 32-bit virtual address space.

After echo 0 | sudo tee /proc/sys/vm/mmap_min_addr, I can map the zero page, but I don't know why I can't map -4096, i.e. (void*)0xfffff000, the highest page. Why does mmap2((void*)-4096) return -ENOMEM?

strace ./a.out 
execve("./a.out", ["./a.out"], 0x7ffe08827c10 /* 65 vars */) = 0
strace: [ Process PID=1407 runs in 32 bit mode. ]
....
mmap2(0xfffff000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0

Also, what check is rejecting it in linux/mm/mmap.c, and why is it designed that way? Is this part of making sure that creating a pointer to one-past-an-object doesn't wrap around and break pointer comparisons, because ISO C and C++ allow creating a pointer to one-past-the-end, but otherwise not outside of objects.

I'm running under a 64-bit kernel (4.12.8-2-ARCH on Arch Linux), so 32-bit user space has the entire 4GiB available. (Unlike 64-bit code on a 64-bit kernel, or with a 32-bit kernel where the 2:2 or 3:1 user/kernel split would make the high page a kernel address.)

I haven't tried from a minimal static executable (no CRT startup or libc, just asm) because I don't think that would make a difference. None of the CRT startup system calls look suspicious.

While stopped at a breakpoint, I checked /proc/PID/maps. The top page isn't already in use. The stack includes the 2nd highest page, but the top page is unmapped.

00000000-00001000 rw-p 00000000 00:00 0             ### the mmap(0) result
08048000-08049000 r-xp 00000000 00:15 3120510                 /home/peter/src/SO/a.out
08049000-0804a000 r--p 00000000 00:15 3120510                 /home/peter/src/SO/a.out
0804a000-0804b000 rw-p 00001000 00:15 3120510                 /home/peter/src/SO/a.out
f7d81000-f7f3a000 r-xp 00000000 00:15 1511498                 /usr/lib32/libc-2.25.so
f7f3a000-f7f3c000 r--p 001b8000 00:15 1511498                 /usr/lib32/libc-2.25.so
f7f3c000-f7f3d000 rw-p 001ba000 00:15 1511498                 /usr/lib32/libc-2.25.so
f7f3d000-f7f40000 rw-p 00000000 00:00 0 
f7f7c000-f7f7e000 rw-p 00000000 00:00 0 
f7f7e000-f7f81000 r--p 00000000 00:00 0                       [vvar]
f7f81000-f7f83000 r-xp 00000000 00:00 0                       [vdso]
f7f83000-f7fa6000 r-xp 00000000 00:15 1511499                 /usr/lib32/ld-2.25.so
f7fa6000-f7fa7000 r--p 00022000 00:15 1511499                 /usr/lib32/ld-2.25.so
f7fa7000-f7fa8000 rw-p 00023000 00:15 1511499                 /usr/lib32/ld-2.25.so
fffdd000-ffffe000 rw-p 00000000 00:00 0                       [stack]

Are there VMA regions that don't show up in maps that still convince the kernel to reject the address? I looked at the occurrences of ENOMEM in linux/mm/mmapc., but it's a lot of code to read so maybe I missed something. Something that reserves some range of high addresses, or because it's next to the stack?

Making the system calls in the other order doesn't help (but PAGE_ALIGN and similar macros are written carefully to avoid wrapping around before masking, so that wasn't likely anyway.)

Full source, compiled with gcc -O3 -fno-pie -no-pie -m32 address-wrap.c:

#include <sys/mman.h>

//void *mmap(void *addr, size_t len, int prot, int flags,
//           int fildes, off_t off);

int main(void) {
    volatile unsigned *high =
        mmap((void*)-4096L, 4096, PROT_READ | PROT_WRITE,
             MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS,
             -1, 0);
    volatile unsigned *zeropage =
        mmap((void*)0, 4096, PROT_READ | PROT_WRITE,
             MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS,
             -1, 0);


    return (high == MAP_FAILED) ? 2 : *high;
}

(I left out the part that tried to deref (int*)-2 because it just segfaults when mmap fails.)

778

asked Dec 08 '17 10:12

Peter Cordes

1 Answers

The mmap function eventually calls either do_mmap or do_brk_flags which do the actual work of satisfying the memory allocation request. These functions in turn call get_unmapped_area. It is in that function that the checks are made to ensure that memory cannot be allocated beyond the user address space limit, which is defined by TASK_SIZE. I quote from the code:

 * There are a few constraints that determine this:
 *
 * On Intel CPUs, if a SYSCALL instruction is at the highest canonical
 * address, then that syscall will enter the kernel with a
 * non-canonical return address, and SYSRET will explode dangerously.
 * We avoid this particular problem by preventing anything executable
 * from being mapped at the maximum canonical address.
 *
 * On AMD CPUs in the Ryzen family, there's a nasty bug in which the
 * CPUs malfunction if they execute code from the highest canonical page.
 * They'll speculate right off the end of the canonical space, and
 * bad things happen.  This is worked around in the same way as the
 * Intel problem.

#define TASK_SIZE_MAX   ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)

#define IA32_PAGE_OFFSET    ((current->personality & ADDR_LIMIT_3GB) ? \
                    0xc0000000 : 0xFFFFe000)

#define TASK_SIZE       (test_thread_flag(TIF_ADDR32) ? \
IA32_PAGE_OFFSET : TASK_SIZE_MAX)

On processors with 48-bit virtual address spaces, __VIRTUAL_MASK_SHIFT is 47.

Note that TASK_SIZE is specified depending on whether the current process is 32-bit on 32-bit, 32-bit on 64-bit, 64-bit on 64-bit. For 32-bit processes, two pages are reserved; one for the vsyscall page and the other used as a guard page. Essentially, the vsyscall page cannot be unmapped and so the highest address of the user address space is effectively 0xFFFFe000. For 64-bit processes, one guard page is reserved. These pages are only reserved on 64-bit Intel and AMD processors because only on these processors the SYSCALL mechanism is used.

Here is the check that is performed in get_unmapped_area:

if (addr > TASK_SIZE - len)
     return -ENOMEM;

186

answered Oct 16 '22 20:10

Hadi Brais

Related questions
                            
                                SSH connection screwed up after VPN connection established [closed]
                            
                                What are the concepts of policies and attributes in generic netlink?
                            
                                Where are package library and header files installed?
                            
                                PyCharm 4 wont open Qt Designer on double click of .ui file. (Linux)
                            
                                Working with Password Protected Excel Sheets in Python on Linux
                            
                                Tips on getting docker to work without having to run `sudo docker -d` on Ubuntu 15.04
                            
                                How to limit the number of instances UpStart can simultaneously start or respawn
                            
                                Is it possible to use Linux Perf profiler inside C++ code?
                            
                                Which code in LLVM IR runs before "main()"?
                            
                                CTRL-V mapped to paste instead block visual mode in Vim on Elementary OS (linux)
                            
                                C++ Sockets - Server doesn't accept multiple clients (linux)
                            
                                Why does all non matching traffic go to the first VirtualHost rather than the default site config in httpd.conf
                            
                                Visual Studio cross-compilation to Linux
                            
                                Ubuntu based docker-machine image
                            
                                YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register
                            
                                Does a Tickless Linux Kernel Introduce Benchmark Timing Variations?
                            
                                Where are the null-terminated strings when converting from C to assembly?
                            
                                How to query Vsync phase in Linux
                            
                                Unable to download mongo-connector using pip
                            
                                .rodata section loaded in executable page

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why can't I mmap(MAP_FIXED) the highest virtual page in a 32-bit Linux process on a 64-bit kernel?

Tags:

linux

x86

assembly

linux-kernel

mmap

Peter Cordes

People also ask

1 Answers

Hadi Brais

Recent Activity

Donate For Us