Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract global variables from a.out file

Edit (updated question)

I have a simple C program:

   // it is not important to know what the code does you may skip the code 

main.c

#include <bsp.h>

unsigned int   AppCtr;
unsigned char  AppFlag;
int SOME_LARGE_VARIABLE;

static  void  AppTest (void);

void  main (void)
{
    AppCtr  = 0;
    AppFlag = 0;        
    AppTest();
}

static void Foo(void){
    SOME_LARGE_VARIABLE=15; 
}


static  void  AppTest (void)
{
    unsigned int  i;
    i = 0;
    while (i < 200000) {
        i++;
    }

    BSP_Test();      
    SOME_LARGE_VARIABLE=3;    
    Foo();
}

bsp.c

extern int SOME_LARGE_VARIABLE;
extern unsigned char  AppFlag;

unsigned int long My_GREAT_COUNTER;

void  BSP_Test (void) {
  SOME_LARGE_VARIABLE = 5;
  My_GREAT_COUNTER = 4;
}

(the program does not do anything useful... My goal is to extract the variable names their location where they are being declared and their memory address)

When I compile the program I get the file a.out which is an elf file containing debug information.

Someone on the company wrote a program in .net 5 years ago that will get all this information from the a.out file. This is what the code returns:

   //  Name          Display Name                    Type      Size     Address

enter image description here

For this small program it works great and also for other large projects.

That code is 2000 lines long with several bugs and it does not support .NET version 4. That's why I am trying to recreate it.


So my question is, I am lost in the sense that I don't know what approach to take in order to solve this problem. These are the options I have been considering:

  1. Organize the buggy code of the program I showed on the first image and try to see what it does and how it parses the a.out file in order to get that information. Once I fully understand it try to figure out why it does not support version 3 and 4.

  2. I am ok at creating regex expressions so maybe try to look for the pattern in the a.out file by doing something like: enter image description here So far I was able to find the pattern where there is just one file (main.c). But when there are several files it get's more complicated. I haven't tried it yet. Maybe it will be not that complicated and it will be possible to find the pattern.

  3. Install Cygwin so that I can use linux commands on windows such as objdump, nm or elfread. I have't played enough with the commands when I use those commands such as readelf -w a.out I get way more information that I need. There are some cons why I have not spend that much time with this approach:

    • Cons: It takes a while to install cygwin on windows and when giving this application to our customers we don't want them to have to install it. Maybe there is a way of just installing the commands objdump and elfread without having to install the whole thing

    • Pros: If we find the right command to use we will not be reinventing the wheel and save some time. Maybe it is a matter of parsing the results of a command such as objdump -w a.out


In case you want to download the a.out file in order to parse it here it is.


Summary

I will to be able to get the global variables on a.out file. I will like to know what type each variable is (int, char, ..), what memory address they have and I will also like to know on what file the variable is being declared (main.c or someOtherFile.c). I will appreciate if I don't have to use cygwin as that will make it more easy to deploy. Since this question asks for a lot, I attempted to split it into more:

  • objdump/readelf get variables information
  • Get location of symbols in a.out file

perhaps I should delete the other questions. sorry being redundant.

like image 631
Tono Nam Avatar asked Jun 12 '12 19:06

Tono Nam


1 Answers

Here is what I will do. Why reinvent the wheel!

  1. Download linux commands that will be needing on windows from here.

    on the bin directory there should be: readelf.exe

    Note we will not need Cygwin or any program so deploying will be simple!

  2. Once we have that file execute in cmd:

    // cd "path where readelf.exe is"
    readelf.exe -s a.out
    

    and this is the list that will come out: enter image description here

    so if you take a look we are interested in getting all the variables that are of type OBJECT with size greater than 0.

  3. Once we got the variables we can use the readelf.exe -w a.out command to take a look at the tree and it looks like:enter image description here let's start looking for one of the variable we found on step 2 (SOME_GREAT_COUNTER) Note that at the top we know the location where the variable is being declared, we got more information such as the line where it was declared and the memory address

  4. The last thing we are missing to do is to get the type. if you take a look we see that the type is = <0x522>. What that means is that we have to go to 522 of the tree to get more info about that time. If we go to that part this is what we get:enter image description here From looking at the tree we know that SOME_LARGE_VARIABLE is of type unsigned long

like image 142
Tono Nam Avatar answered Sep 27 '22 22:09

Tono Nam