Recently it occurred to me that a lot of emulators are slow because they have to simulate not just the CPU but also the memory of the emulated device. When the device has memory-mapped I/O, virtual memory, or just unused address space, then every memory access has to be simulated in software.
I feel like it might be a lot faster if the OS did this for us, by means of virtual memory. I'll use Game Boy emulation as an example for simplicity's sake but obviously this method would be better for newer, more powerful machines.
The Game Boy memory map is roughly:
So a traditional emulator has to translate every memory access something like:
if(addr < 0x4000) return rom[addr];
else if(addr < 0x8000) return rom[(addr - 0x4000) + (0x4000 * cur_rom_bank)];
else if(addr < 0xA000) {
if(vram_accessible) return vram[addr - 0x8000];
else return 0xFF;
}
else if(addr < 0xC000) return saveram[addr - 0xA000];
else if(addr < 0xE000) return ram[addr - 0xC000];
else if(addr < 0xFE00) return ram[addr - 0xE000];
else if(addr < 0xFE9F) return oam[addr - 0xFE00];
else if(addr < 0xFF00) return 0xFF; //or whatever should be here
else if(addr < 0xFF80) return handle_io_read(addr);
else return hram[addr - 0xFF80];
Obviously that can be optimized by using a switch or table, but still it's a lot of code to run for every memory access. We could potentially improve the emulation speed quite a bit by mapping some pages to those addresses in our process's memory map:
Then handle the SIGSEGV (or whatever signal would be generated) we get when accessing those pages. So a read from ROM or a write to RAM can just be performed directly, and a write to ROM will raise an exception which we can handle. We can change the permissions of VRAM (0x8000 - 0x9FFF) to be RW- when it should be accessible and --- when it shouldn't. In theory it could be much faster since it doesn't require the emulator to manually map every memory access in software.
I know that I can use mmap()
to map pages at fixed addresses with various permissions. What I don't know is:
I'd expect generating a SIGSEGV, catching it, handling it, and resuming, would have more perf overhead than on the original hardware, so arrange for it to only happen when there's actually an error that can be slow.
This is a nice technique for memory protection / array bounds checking when violations are rare, and it's ok if they're slow. Speeding up the common case a bit is a win, even if it makes the exceptional case much slower, is a win when the exceptional case doesn't happen in normal emulated code.
I've heard of Javascript emulators doing this to get cheaper array bounds checking: allocate an array so it ends at the top of a page, where the next page is unmapped.
Hopefully this will get you started looking at docs that will tell you what actually can be done.
Updating page tables is fairly slow. Try to find a balance where you can take advantage of user-space memory protection for some of the checks, but you aren't constantly mapping/unmapping pages from your memory space during the "common case" of what your emulated code does. Predicted branches run really fast, esp. if they're predicted not taken.
I've seen Linux kernel discussion / notes indicating that playing tricks with mmap isn't worth it over just memcpy
of a single page. For larger block of memory, or less checking on repeated accesses, the benefit will outweigh the setup overhead.
You'll want to use mprotect(2)
to change the permissions on (ranges of) pages. No, mappings can't overlap. See the MAP_FIXED
option in mmap(2)
:
If the memory region specified by addr and len overlaps pages of any existing mapping(s), then the overlapped part of the existing mapping(s) will be discarded.
IDK if you can do anything useful with x86 segment registers when accessing emulated memory, to map guest address 0 to some other address in your process's virtual address space. You can map virtual address 0, but by default Linux disables it so that NULL-pointer dereferences don't silently work!
Users of your software will have to futz with sysctl (same as for WINE) to enable it:
# Ubuntu's /etc/sysctl.d/10-zeropage.conf # Protect the zero page of memory from userspace mmap to prevent kernel # NULL-dereference attacks against potential future kernel security # vulnerabilities. (Added in kernel 2.6.23.) # # While this default is built into the Ubuntu kernel, there is no way to # restore the kernel default if the value is changed during runtime; for # example via package removal (e.g. wine, dosemu). Therefore, this value # is reset to the secure default each time the sysctl values are loaded. vm.mmap_min_addr = 65536
Like I said, you can maybe use a segment register override on all loads/stores into guest (emulated-machine) memory, to remap it to a more reasonable page. Or maybe just use a constant offset of 64kiB (or more, to maybe put it above the text/data/bss (heap) of the emulation software. Or a non-constant offset using a pointer to the base of your mmapped guest-memory region, so everything is relative to a global variable. With gcc, this might be a good candidate for requesting that gcc keep that global in a register across all your functions. IDK, you'd have to see if that helped perf or not. A constant offset would end up making every instruction accessing guest memory need a 32b displacement field in the addressing mode, rather than 0 or 8b.
A segment register, if it works the way I think it does (as a constant offset you can apply with a segment-override prefix, instead of a 32b displacement modifier) would be much harder to get the compiler to generate, AFAIK. If it was just loads/stores, that would be one thing: you could use an inline asm wrapper for a load and store insn. But for efficient x86 code, all kinds of ALU instructions should use memory operands to reduce frontend bottlenecks via micro-fusion.
You could maybe just define a global char *const guest_mem = (void*)0x2000000;
or something, and then use mmap
with MAP_FIXED
to force mapping memory there? Then guest memory accesses can compile to more efficient one-register addresisng modes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With