Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does the ljmp instruction do in the linux kernel fork system call?

I am studying linux kernel source (old version 0.11v). When I checked about fork system call, there is some asm code for context switching like this:

/*
 * switch_to(n) should switch tasks to task nr n, first
 * checking that n isn't the current task, in which case it does nothing.
 * This also clears the TS-flag if the task we switched to has used
 * tha math co-processor latest.
 */
#define switch_to(n) {\
struct {long a,b;} __tmp; \
__asm__("cmpl %%ecx,current\n\t" \
    "je 1f\n\t" \
    "movw %%dx,%1\n\t" \
    "xchgl %%ecx,current\n\t" \
    "ljmp *%0\n\t" \
    "cmpl %%ecx,last_task_used_math\n\t" \
    "jne 1f\n\t" \
    "clts\n" \
    "1:" \
    ::"m" (*&__tmp.a),"m" (*&__tmp.b), \
    "d" (_TSS(n)),"c" ((long) task[n])); \
}

I guess that the "ljmp %0\n\t" will work for changing TSS and LDT. I know that the ljmp instruction needs two parameters, like ljmp $section, $offset. I think the ljmp instruction has to use _TSS(n), xx. We don't need to provide a meaningful offset value, because cpu will change cpu's register including eip for new task.

  1. I don't know how ljmp %0 works like ljmp $section, $offset and why this instruction uses %0. Is %0 just the address of __tmp.a?

  2. CPU might save the EIP register to the TSS for the old task when excuting the ljmp instruction. Am I right that the EIP value for old task is address of "cmpl %%ecx,_last_task_used_math\n\t"?

like image 218
bongsu Avatar asked Nov 18 '15 15:11

bongsu


1 Answers

What does this syntax even mean?

This unreadable mess is GCC's Extended ASM, which has a general format of

 asm [volatile] ( AssemblerTemplate
                : OutputOperands
              [ : InputOperands
              [ : Clobbers ] ] )

In this case, the __asm__ statement only contains an AssemblerTemplate and InputOperands. The input operands part explains what %0 and %1 mean, and how ecx and edx get their value:

  • The first input operand is "m" (*&__tmp.a), so %0 becomes the memory address of __tmp.a (to be perfectly honest, I'm not sure why *& is needed here).
  • The second input operand is "m" (*&__tmp.b), so %1 becomes the memory address of __tmp.b.
  • The third input operand is "d" (_TSS(n)), so the DX register will contain _TSS(n) when this code starts.
  • The fourth input operand is "c" ((long) task[n]), so the ECX register will contain task[n] when this code starts.

When cleaned up, the code can be interpreted as follows

    cmpl %ecx, _current
    je 1f

    movw %dx, __tmp.b          ;; the address of __tmp.b
    xchgl %ecx, _current
    ljmp __tmp.a               ;; the address of __tmp.a

    cmpl %ecx, _last_task_used_math
    jne 1f
    clts
1:

How can ljmp %0 even work?

Please note that there are two forms of the ljmp (also known as jmpf) instruction. The one you know (opcode EA) takes two immediate arguments: one for the segment, one for the offset. The one used here (opcode FF /5) is different: the segment and address arguments are not in the code stream, but are somewhere in memory, and the instruction points at the address.

In this case, the argument to ljmp points at the beginning to the __tmp structure. The first four bytes (__tmp.a) contain the offset, and the two bytes that follow (the lower half of __tmp.b) contain the segment.

This indirect ljmp __tmp.a would be equivalent to ljmp [__tmp.b]:[__tmp.a], except that ljmp segment:offset can only take immediate arguments. If you want to switch to an arbitrary TSS without self-modifying code (which would be an awful idea), the indirect instruction is the one to use.

Also note that __tmp.a is never initialised. We can assume that _TSS(n) refers to a task gate (because that's the way you do context switches with the TSS), and the offset for jumps "through" a task gate are ignored.

Where does the old instruction pointer go?

This piece of code doesn't store the old EIP in the TSS.

(I'm guessing after this point, but I think this guess is reasonable.)

The old EIP is stored on the kernel-space stack that corresponds with the old task.

Linux 0.11 allocates a ring 0 stack (i.e. a stack for the kernel) for each task (see the copy_process function in fork.c, which initialises the TSS). When an interrupt happens during task A, the old EIP is saved on the kernel-space stack rather than the user-space stack. If the kernel decides to switch to task B, the kernel-space stack is also switched. When the kernel eventually switches back to task A, this stack is switched back, and through an iret we can return to where we were in task A.

like image 112
3 revsuser824425 Avatar answered Sep 28 '22 20:09

3 revsuser824425