In computer science, self-modifying code (SMC) is code that alters its own instructions while it is executing – usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code, thus simplifying maintenance.
Runtime code modification, of self modifying code as it is often referred to, has been used for decades – to implement JITters, writing highly optimized algorithms, or to do all kinds of interesting stuff.
Self-modifying programs are programs which are able to modify their own code at runtime. Nowadays, self- modifying programs are commonly used. For example, a packer transforms any program into a program with equiva- lent behavior, but which decompresses and/or decrypts some instructions.
Code modification requests must be based on practical difficulty in achieving code conformance and must include proposed alternative methods of compliance or compensatory measures beyond prescriptive code. A self-imposed hardship generally will not be accepted as a basis for an approved code modification request.
There are many valid cases for code modification. Generating code at run time can be useful for:
Sometimes code is translated into code at runtime (this is called dynamic binary translation):
Code modification can be used to work around limitations of the instruction set:
More cases of code modification:
This has been done in computer graphics, specifically software renderers for optimization purposes. At runtime the state of many parameters is examined and an optimized version of the rasterizer code is generated (potentially eliminating a lot of conditionals) which allows one to render graphics primitives e.g. triangles much faster.
One valid reason is because the asm instruction set lack some necessary instruction, which you could build yourself. Example: On x86 there is no way to create an interrupt to a variable in a register (e.g. make interrupt with interrupt number in ax). Only const numbers coded into the opcode were allowed. With selfmodifying code one could emulate this behaviour.
Some compilers used to use it for static variable initialization, avoiding the cost of a conditional for subsequent accesses. In other words they implement "execute this code only once" by overwriting that code with no-ops the first time it's executed.
There are many cases:
Some OSs' security models mean self-modifying code can't run without root/admin privileges, making it impractical for general-purpose use.
From Wikipedia:
Application software running under an operating system with strict W^X security cannot execute instructions in pages it is allowed to write to—only the operating system itself is allowed to both write instructions to memory and later execute those instructions.
On such OSes, even programs like the Java VM need root/admin privileges to execute their JIT code. (See http://en.wikipedia.org/wiki/W%5EX for more details)
The Synthesis OS basically partially evaluated your program with respect to API calls, and replaced OS code with the results. The main benefit is that lots of error checking went away (because if your program isn't going to ask the OS to do something stupid, it doesn't need to check).
Yes, that's an example of runtime optimization.
Many years ago i spent a morning trying to debug some self-modifying code, one instruction changed the target address of the following instruction, i.e., i was computing a branch address. It was written in assembly language and worked perfectly when i stepped through the program one instruction at a time. But when i ran the program it failed. Eventually, i realized that the machine was fetching 2 instructions from memory and (as the instructions were laid out in memory) the instruction i was modifying had already been fetched and thus the machine was executing the unmodified (incorrect) version of the instruction. Of course, when i was debugging, it was only doing one instruction at a time.
My point, self-modifying code can be extremely nasty to test/debug and often has hidden assumptions as to the behavior of the machine (be it hardware or virtual). Moreover, the system could never share code pages among the various threads/processes executing on the (now) multi-core machines. This defeats many of the benefits to virtual memory, etc. It also would invalidate branch optimizations done at the hardware level.
(Note - i do not included JIT in the category of self-modifying code. JIT is translating from one representation of the code to an alternate representation, it is not modifying the code)
All, in all, it's just a bad idea - really neat, really obscure, but really bad.
of course - if all you have is an 8080 and ~512 bytes of memory you might have to resort to such practices.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With