Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

eval at Assembly level

Tags:

assembly

eval

I started studying the very basic Assembly language and I've learnt that the compiled code goes into a special segment named Code Segment which is (at least in modern architectures) a write-protected segment.

But a question pops out: in some programming languages (i.e.: EcmaScript, Python, etc.) there is the magic eval() function that takes a string, parses it and then executes it.

As the code is evaluated at runtime (then after the Code Segment is populated) and the Code Segment is write-protected, what kind of sorcery does it perfom?

I suppose it's related to JIT compilation but yet no clues on how it works at low level.

like image 428
Fylax Avatar asked Feb 06 '23 22:02

Fylax


2 Answers

Let's take the example of python.

Python is interpreted (unless using pypi or JIT-capable engines, but even there you can call the interpreter dynamically). When running, the program has always access to the interpreter built-in the python executable, which is running at the time (the evaluation is part of the runtime which is consequent)

So eval just evaluates the expression using the built-in interpreter.

Since python means to be performant, your code is converted to bytecode when loading modules to save text-parsing time (Java does that at compile-time for instance), but the real machine instructions which are executed are contained in the python executable (which interprets bytecode and do appropriate actions) or loaded .pyd files which are DLLs.

JIT is just another optimization on top of the bytecode one: it generates native code on-the-fly in a memory segment but you don't have easy access to this segment (like you have in C with function addresses) so it would be very difficult to hack this code from inside a python program.

This is not possible (at least not easily) in assembly or compiled languages (C, C++, Ada...), not really because of write protection of code segment (which is not guaranteed) but merely because of the inability of the running program to assemble/compile code: it does not embed the compiler/assembler. The runtime, if it exists, is minimal and certainly does not contain a source-code evalulation.

The closest easiest thing would be to create a temporary file with your program, call the compiler/assembler on it from your program and execute it in a separate process or dynamically loading a DLL, but that's not trivial.

The other possible thing as Frank remarked would be to create a virtual machine within your program to evaluate the machine-code instructions like the real CPU would do (or high-level instructions like the compiler would do). Needless to say it is not trivial, but some already existing libraries do that (QEMU for instance), and even with existing material, it's far from easy to implement it.

like image 172
Jean-François Fabre Avatar answered Mar 08 '23 14:03

Jean-François Fabre


Answer from different point of view...

The non-writeable flag of "code segment" is just arrangement done by OS during loading of executable. There's nothing on HW level preventing OS to prepare writeable+executable page of memory too, it just become a convenient safety measure and bug prevention to run executables in write-protected memory page. And the creators of applications respect that and don't use self-modifiable code any more (was a common practice in early Assembly programming). (unless they allocate additional memory from OS exactly for this purpose, to write there and execute it after)

Also the whole "code segment" is high level abstraction, the CPU itself is not aware of something like that.

(x86) CPU has only current privileges level, and virtual memory map, so any memory address it does access, it will translate into physical memory address trough the virtual map definition while checking the privileges of that memory "page" (can-read / can-write) against requested operation.

In case the access is not valid, it will trap into error handler, which is usually OS provided.

Whether the application is loaded with code and data in separate memory pages, or even data sections have fine distinction between writeable and read-only, it's all up to OS and application loader to set it up by the means of the simple privilege/flag mechanics of memory mapping provided by CPU. If you have your own OS, you can as well map the whole memory in one big unprotected chunk with read+execute+write allowed for everyone.

like image 30
Ped7g Avatar answered Mar 08 '23 13:03

Ped7g