Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there any smart cases of runtime code modification?

People also ask

Can a code change itself?

In computer science, self-modifying code (SMC) is code that alters its own instructions while it is executing – usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code, thus simplifying maintenance.

What is runtime modification?

Runtime code modification, of self modifying code as it is often referred to, has been used for decades – to implement JITters, writing highly optimized algorithms, or to do all kinds of interesting stuff.

What is a self modifying program?

Self-modifying programs are programs which are able to modify their own code at runtime. Nowadays, self- modifying programs are commonly used. For example, a packer transforms any program into a program with equiva- lent behavior, but which decompresses and/or decrypts some instructions.

What is code modification?

Code modification requests must be based on practical difficulty in achieving code conformance and must include proposed alternative methods of compliance or compensatory measures beyond prescriptive code. A self-imposed hardship generally will not be accepted as a basis for an approved code modification request.


There are many valid cases for code modification. Generating code at run time can be useful for:

  • Some virtual machines use JIT compilation to improve performance.
  • Generating specialized functions on the fly has long been common in computer graphics. See e.g. Rob Pike and Bart Locanthi and John Reiser Hardware Software Tradeoffs for Bitmap Graphics on the Blit (1984) or this posting (2006) by Chris Lattner on Apple's use of LLVM for runtime code specialization in their OpenGL stack.
  • In some cases software resorts to a technique known as trampoline which involves the dynamic creation of code on the stack (or another place). Examples are GCC's nested functions and the signal mechanism of some Unices.

Sometimes code is translated into code at runtime (this is called dynamic binary translation):

  • Emulators like Apple's Rosetta use this technique to speed up emulation. Another example is Transmeta's code morphing software.
  • Sophisticated debuggers and profilers like Valgrind or Pin use it to instrument your code while it is being executed.
  • Before extensions were made to the x86 instruction set, virtualization software like VMWare could not directly run privileged x86 code inside virtual machines. Instead it had to translate any problematic instructions on the fly into more appropriate custom code.

Code modification can be used to work around limitations of the instruction set:

  • There was a time (long ago, I know), when computers had no instructions to return from a subroutine or to indirectly address memory. Self modifying code was the only way to implement subroutines, pointers and arrays.

More cases of code modification:

  • Many debuggers replace instructions to implement breakpoints.
  • Some dynamic linkers modify code at runtime. This article provides some background on the runtime relocation of Windows DLLs, which is effectively a form of code modification.

This has been done in computer graphics, specifically software renderers for optimization purposes. At runtime the state of many parameters is examined and an optimized version of the rasterizer code is generated (potentially eliminating a lot of conditionals) which allows one to render graphics primitives e.g. triangles much faster.


One valid reason is because the asm instruction set lack some necessary instruction, which you could build yourself. Example: On x86 there is no way to create an interrupt to a variable in a register (e.g. make interrupt with interrupt number in ax). Only const numbers coded into the opcode were allowed. With selfmodifying code one could emulate this behaviour.


Some compilers used to use it for static variable initialization, avoiding the cost of a conditional for subsequent accesses. In other words they implement "execute this code only once" by overwriting that code with no-ops the first time it's executed.


There are many cases:

  • Viruses commonly used self-modifying code to "deobfuscate" their code prior to execution, but that technique can also be useful in frustrating reverse engineering, cracking and unwanted hackery
  • In some cases, there can be a particular point during runtime (e.g. immediately after reading the config file) when it is known that - for the rest of the lifetime of the process - a particular branch will always or never be taken: rather than needlessly checking some variable to determine which way to branch, the branch instruction itself could be modified accordingly
    • e.g. It may become known that only one of the possible derived types will be handled, such that virtual dispatch can be replaced with a specific call
    • Having detected which hardware is available, use of a matching code may be hardcoded
  • Unnecessary code can be replaced with no-op instructions or a jump over it, or have the next bit of code shifted directly into place (easier if using position-independent opcodes)
  • Code written to facilitate its own debugging might inject a trap/signal/interrupt instruction expected by the debugger at a strategic location.
  • Some predicate expressions based on user input might be compiled into native code by a library
  • Inlining some simple operations that aren't visible until runtime (e.g. from dynamically loaded library)...
  • Conditionally adding self-instrumentation/profiling steps
  • Cracks may be implemented as libraries that modify the code that loads them (not "self" modifying exactly, but needs the same techniques and permissions).
  • ...

Some OSs' security models mean self-modifying code can't run without root/admin privileges, making it impractical for general-purpose use.

From Wikipedia:

Application software running under an operating system with strict W^X security cannot execute instructions in pages it is allowed to write to—only the operating system itself is allowed to both write instructions to memory and later execute those instructions.

On such OSes, even programs like the Java VM need root/admin privileges to execute their JIT code. (See http://en.wikipedia.org/wiki/W%5EX for more details)


The Synthesis OS basically partially evaluated your program with respect to API calls, and replaced OS code with the results. The main benefit is that lots of error checking went away (because if your program isn't going to ask the OS to do something stupid, it doesn't need to check).

Yes, that's an example of runtime optimization.


Many years ago i spent a morning trying to debug some self-modifying code, one instruction changed the target address of the following instruction, i.e., i was computing a branch address. It was written in assembly language and worked perfectly when i stepped through the program one instruction at a time. But when i ran the program it failed. Eventually, i realized that the machine was fetching 2 instructions from memory and (as the instructions were laid out in memory) the instruction i was modifying had already been fetched and thus the machine was executing the unmodified (incorrect) version of the instruction. Of course, when i was debugging, it was only doing one instruction at a time.

My point, self-modifying code can be extremely nasty to test/debug and often has hidden assumptions as to the behavior of the machine (be it hardware or virtual). Moreover, the system could never share code pages among the various threads/processes executing on the (now) multi-core machines. This defeats many of the benefits to virtual memory, etc. It also would invalidate branch optimizations done at the hardware level.

(Note - i do not included JIT in the category of self-modifying code. JIT is translating from one representation of the code to an alternate representation, it is not modifying the code)

All, in all, it's just a bad idea - really neat, really obscure, but really bad.

of course - if all you have is an 8080 and ~512 bytes of memory you might have to resort to such practices.