I found this recently: https://github.com/xoreaxeaxeax/movfuscator
It seems to be contingent on the fact that mov
is turing-complete. Is that true, and why?
The x86 `mov` instruction is Turing complete. This means you can compile DOOM to run entirely on MOV instructions with only a constant* slowdown *882000 github.com/Battelle/movfu…
The MOV instruction has a few limitations: an immediate value cannot be moved into a segment register directly (i.e. mov ds,10) segment registers cannot be copied directly (i.e. mov es,ds) a memory location cannot be copied into another memory location (i.e. mov aNumber,aDigit)
The MOV instruction moves data bytes between the two specified operands. The byte specified by the second operand is copied to the location specified by the first operand. The source data byte is not affected.
The physical chip with the external hardware interface (what you probably see as the processor) is just one intermediate step to the machine we call Turing complete.
It seems to be contingent on the fact that mov is turing-complete. Is that true, and why? Yes, x86's mov is Turing complete. I added that tag to your question because it may not be true for other ISAs with an instruction called mov, and the movfuscator compiler only targets x86.
Systems rarely come to be known as Turing Complete through direct proof of their capacity for all Turing-computable functions. Instead, they often are "analogized" to an existing system already known to be Turing Complete. The paper by Stephen Dolan which shows mov to be TC does this by constructing a simulation of a TM system in terms of mov.
A universal Turing machine can be used to simulate any Turing machine and by extension the computational aspects of any possible real-world computer. To show that something is Turing-complete, it is enough to show that it can be used to simulate some Turing-complete system.
With a little extra hardware, "mov" could be made genuinely Turing complete if one has an address which when read reports the contents of a paper tape under a tape head, one that when written will punch the tape and move it one step.
Yes, x86's mov
is Turing complete. I added that tag to your question because it may not be true for other ISAs with an instruction called mov
, and the movfuscator compiler only targets x86.
It's not "mov" itself doing computation, it's x86 addressing modes which can do addition (and bit-shift). I haven't looked in detail at how it works, but it relies a lot on lookup tables, and stuff like mov eax, [base + eax*4]
loading one of two possible values depending on EAX being 0 or 1.
Also remember that x86 mov
has several forms: between memory and register (load, store, or reg<-reg), or immediate to memory or register. And with addressing modes including absolute and register indirect.
I don't think it relies on self-modifying code, but correct me if I'm wrong. (And if it does, I think/hope movfuscator at least won't create instructions other than mov
. That would be cheating.)
Also, it's only sort of true; you need some way to loop the main program, assuming the original source isn't straight-line with no loops - the Movfuscator github readme talks about this:
While Dolan's paper required a jmp instruction, the M/o/Vfuscator does not - it uses a faulting mov instruction to achieve the infinite execution loop. If you're worried that this is still "jumping", the same effect could be achieved through pages aliased to the same address, wrapping execution around the upper range of memory, ring 0 exception handling, or simply repeating the mov loop indefinitely. A jmp is currently used to dispatch external functions - if this is a problem, avoid using external functions, or compile libraries with the M/o/Vfuscator as well.
When creating a user-space environment for mov-only code to run, I assume it creates a SIGSEGV handler (on POSIX OSes) that restarts execution from the top. So any faulting load can restart the main loop.
Letting execution wrap around is also mentioned as a possibility:
Wraparound of IP can work well in 16-bit mode where IP is 16-bit but CS:IP form a 20-bit linear address (real mode) or in 16-bit protected mode a 64k window somewhere in linear address space. i.e. you can have a 64k block of instructions in only part of your address space, with other space left for data. The DS segment can use a different base. (32-bit addressing modes in 16-bit mode are possible so you still have the full power of any register and scaled-index addressing mode.) Note that the mnemonic for reading and writing segment registers is also mov
.
But harder in 32-bit mode where EIP is 32-bit and so are linear addresses after seg+off calculation. Unless there's some other trick, wraparound can only happen across the entire address space, regardless of what you do with segmentation. This leaves no non-code space for data. Setting a lower segment limit could make code-fetch fault, but that doesn't cause wraparound (unless you set a signal handler, or on bare metal set an interrupt handler address).
And either way, even x86-64 only has 64-bit pointers (in theory; 48 or 57-bit in practice), so space is finite, unlike a real Turing machine with infinite tape.
Systems rarely come to be known as Turing Complete through direct proof of their capacity for all Turing-computable functions. Instead, they often are "analogized" to an existing system already known to be Turing Complete.
The paper by Stephen Dolan which shows mov
to be TC does this by constructing a simulation of a TM system in terms of mov
. It follows that any question which could be posed to the TC system could, at worst, be virtualized within the mov
-constructed simulation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With