Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does one go about understanding GNU source code?

Tags:

c

gnu

project

I'm really sorry if this sounds kinda dumb. I just finished reading K&R and I worked on some of the exercises. This summer, for my project, I'm thinking of re-implementing a linux utility to expand my understanding of C further so I downloaded the source for GNU tar and sed as they both seem interesting. However, I'm having trouble understanding where it starts, where's the main implementation, where all the weird macros came from, etc.

I have a lot of time so that's not really an issue. Am I supposed to familiarize myself with the GNU toolchain (ie. make, binutils, ..) first in order to understand the programs? Or maybe I should start with something a bit smaller (if there's such a thing) ?

I have little bit of experience with Java, C++ and python if that matters.

Thanks!

like image 541
Max Dwayne Avatar asked Jun 16 '10 06:06

Max Dwayne


People also ask

What is source code with example?

When a programmer types a sequence of C programming language statements into Windows Notepad, for example, and saves the sequence as a text file, the text file is said to contain the source code. Source code and object code are sometimes referred to as the "before" and "after" versions of a compiled computer program.

What is source code in game design?

Source code is generally understood to mean programming statements that are created by a programmer with a text editor or a visual programming tool and then saved in a file. Object code generally refers to the output, a compiled file, which is produced when the Source Code is compiled with a C compiler.


2 Answers

The GNU programs big and complicated. The size of GNU Hello World shows that even the simplest GNU project needs a lot of code and configuration around it.

The autotools are hard to understand for a beginner, but you don't need to understand them to read the code. Even if you modify the code, most of the time you can simply run make to compile your changes.

To read code, you need a good editor (VIM, Emacs) or IDE (Eclipse) and some tools to navigate through the source. The tar project contains a src directory, that is a good place to start. A program always start with the main function, so do

grep main *.c

or use your IDE to search for this function. It is in tar.c. Now, skip all the initialization stuff, untill

/* Main command execution.  */

There, you see a switch for subcommands. If you pass -x it does this, if you pass -c it does that, etc. This is the branching structure for those commands. If you want to know what these macro's are, run

grep EXTRACT_SUBCOMMAND *.h

there you can see that they are listed in common.h.

Below EXTRACT_SUBCOMMAND you see something funny:

read_and (extract_archive);

The definition of read_and() (again obtained with grep):

read_and (void (*do_something) (void))

The single parameter is a function pointer like a callback, so read_and will supposedly read something and then call the function extract_archive. Again, grep on it and you will see this:

  if (prepare_to_extract (current_stat_info.file_name, typeflag, &fun))
    {
      if (fun && (*fun) (current_stat_info.file_name, typeflag)
      && backup_option)
    undo_last_backup ();
    }
  else
    skip_member ();

Note that the real work happens when calling fun. fun is again a function pointer, which is set in prepare_to_extract. fun may point to extract_file, which does the actual writing.

I hope I walked you a great deal through this and shown you how I navigate through source code. Feel free to contact me if you have related questions.

like image 84
Sjoerd Avatar answered Oct 05 '22 00:10

Sjoerd


The problem with programs like tar and sed is twofold (this is just my opinion, of course!). First of all, they're both really old. That means they've had multiple people maintain them over the years, with different coding styles and different personalities. For GNU utilities, it's usually pretty good, because they usually enforce a reasonably consistent coding style, but it's still an issue. The other problem is that they're unbelievably portable. Usually "portability" is seen as a good thing, but when taken to extremes, it means your codebase ends up full of little hacks and tricks to work around obscure bugs and corner cases in particular pieces of hardware and systems. And for programs as widely ported as tar and sed, that means there's a lot of corner cases and obscure hardware/compilers/OSes to take into account.

If you want to learn C, then I would say the best place to start is not trying to study code that others have written. Rather, try to write code yourself. If you really want to start with an existing codebase, choose one that's being actively maintained where you can see the changes that other people are making as they make them, follow along in the discussions on the mailing lists and so on.

With well-established programs like tar and sed, you see the result of the discussions that would've happened, but you can't see how software design decisions and changes are being made in real-time. That can only happen with actively-maintained software.

That's just my opinion of course, and you can take it with a grain of salt if you like :)

like image 29
Dean Harding Avatar answered Oct 05 '22 01:10

Dean Harding