Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

argc/argv random data/behavior

Tags:

c

argv

argc

Here is my minimal reproducible example:

#include <stdio.h>

int main( int argc, char* argv[])
{
    printf (" this is the contents of argc:%d\n",argc);
            
    int i;

    for (i = 0; i < argc ; i++){
       printf(" argv = %d = %s\n",i,argv[i]);
    }
      
    return 0;
}

When I change argc in the for loop into a number, lets say 10, the code crashes before it reaches 10:

$ ./argc one two three
 this is the contents of argc:4
 argv = 0 = ./argc
 argv = 1 = one
 argv = 2 = two
 argv = 3 = three
 argv = 4 = (null)
 argv = 5 = SHELL=/bin/bash
 argv = 6 = SESSION_MANAGER=local/wajih:@/tmp/.ICE-unix/1230,unix/wajih:/tmp/.ICE-unix/1230
 argv = 7 = QT_ACCESSIBILITY=1
 argv = 8 = COLORTERM=truecolor
 argv = 9 = XDG_CONFIG_DIRS=/etc/xdg/xdg-ubuntu:/etc/xdg

If I for example, change argc in the for loop to a 100; I get a very long error message, which ends with this:

 argv = 54 = GDMSESSION=ubuntu
 argv = 55 = DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
 argv = 56 = LC_NUMERIC=ar_AE.UTF-8
 argv = 57 = _=./argc
 argv = 58 = OLDPWD=/home/wajih
 argv = 59 = (null)
Segmentation fault (core dumped).

I want to understand the reason this happens.

like image 406
Wajih Mansouri Avatar asked Dec 12 '25 11:12

Wajih Mansouri


2 Answers

It might be easier to understand what's going on here with an analogy.

Suppose I live in a long, narrow house. The house is divided into 10 rooms, but they're all the same size and they're all arranged in a straight line.

Suppose I'm interested in robotics. Suppose I build a little robot to drive around inside my house, taking pictures of each room. Because my house's rooms are all laid out in a straight line, the robot's navigation task is pretty simple.

Once I've got the robot's software working perfectly, I ask the robot to make a complete photographic survey of all 20 rooms in my house. (Oops, I made a mistake, there.) And the robot starts driving along the main axis of the house taking pictures of each room in turn.

After it takes pictures of the first 10 rooms, there's a crashing sound as the robot drives through the end wall of the house. Its pictures of the "11th room" are of splintered wood and plaster. Its pictures of the "12th room" are of the garden outside the end of my house. But then there's another crashing sound, and the robot keeps taking pictures, and somehow, remarkably, they look like the insides of a house again!

It turns out that's because the robot has driven into my neighbor's house and is now taking pictures there.

From this silly little story we can learn two things:

  1. If there are 10 rooms in my house, and I ask my simpleminded robot to take pictures of 20 rooms, something strange, unpredictable, and wrong is probably going to happen.
  2. Even though what happens is going to be strange, unpredictable, and wrong, little bits of it can seem to make some kind of sense, depending on circumstances. In this case, my robot's picture of the "15th room" of my house looked just like a bedroom, although it didn't look like any bedroom in my house, and what the two people were doing in bed there didn't look like anything that happens in my house, either...

But the other important aspect of the analogy is that you obviously can't depend on any of it, because too many of the circumstances are outside of your control. The robot might have damaged itself so badly driving through walls that it couldn't continue taking pictures. If there happened to be a street just past the garden at the end of my house, the robot might have gotten run over by a truck. If there happened to be a cliff just past the garden at the end of my house, the robot might have fallen into the ocean. Etc.

C, like the simpleminded robot in my story, does not have any built-in protections against running off the end of arrays. If you try to access the 15th element of a 10-element array, what you don't typically get is an error message saying "Array bounds exceeded." What you get instead is something strange, unpredictable, and wrong — except that, depending on circumstances, there might seem to be some kind of hidden meaning, which might lead you to waste time trying to figure it out, or asking about it on Stack Overflow. But rather than doing that, you might want to spend your time working on a better obstacle detection or collision avoidance algorithm for the robot, instead. :-)

See also these previous SO questions on the topic of exceeding the bounds of arrays: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, and 14.

like image 155
Steve Summit Avatar answered Dec 15 '25 01:12

Steve Summit


The argv pointer has a very specific location in the program's memory.

When you run a binary, there is always some entry point. In C, that is in the main() function. But, in order to prepare the environment for the binary to start at that location, the OS has to do some things first.

It has to copy over environment variables, request and offset memory from the OS, etc. Because this process is completely deterministic (per OS), you can actually expect to read the environment variables just after these arguments.

Example memory layout on Linux

This principle is fundamental to computer security. If an attacker manages to leak a pointer in this segment of memory, they can overwrite some environment variable (i.e. PATH), to point to their own binary first. hackmd has a really nice example of this: HackMD: Environment variables attack.


Image source: COMPILER, ASSEMBLER, LINKER AND LOADER: A BRIEF STORY

like image 32
Eduardo Lira Avatar answered Dec 14 '25 23:12

Eduardo Lira



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!