How does one go about understanding GNU source code?

Tags:

I'm really sorry if this sounds kinda dumb. I just finished reading K&R and I worked on some of the exercises. This summer, for my project, I'm thinking of re-implementing a linux utility to expand my understanding of C further so I downloaded the source for GNU tar and sed as they both seem interesting. However, I'm having trouble understanding where it starts, where's the main implementation, where all the weird macros came from, etc.

I have a lot of time so that's not really an issue. Am I supposed to familiarize myself with the GNU toolchain (ie. make, binutils, ..) first in order to understand the programs? Or maybe I should start with something a bit smaller (if there's such a thing) ?

I have little bit of experience with Java, C++ and python if that matters.

Thanks!

541

asked Jun 16 '10 06:06

Max Dwayne

2 Answers

The GNU programs big and complicated. The size of GNU Hello World shows that even the simplest GNU project needs a lot of code and configuration around it.

The autotools are hard to understand for a beginner, but you don't need to understand them to read the code. Even if you modify the code, most of the time you can simply run make to compile your changes.

To read code, you need a good editor (VIM, Emacs) or IDE (Eclipse) and some tools to navigate through the source. The tar project contains a src directory, that is a good place to start. A program always start with the main function, so do

grep main *.c

or use your IDE to search for this function. It is in tar.c. Now, skip all the initialization stuff, untill

/* Main command execution.  */

There, you see a switch for subcommands. If you pass -x it does this, if you pass -c it does that, etc. This is the branching structure for those commands. If you want to know what these macro's are, run

grep EXTRACT_SUBCOMMAND *.h

there you can see that they are listed in common.h.

Below EXTRACT_SUBCOMMAND you see something funny:

read_and (extract_archive);

The definition of read_and() (again obtained with grep):

read_and (void (*do_something) (void))

The single parameter is a function pointer like a callback, so read_and will supposedly read something and then call the function extract_archive. Again, grep on it and you will see this:

  if (prepare_to_extract (current_stat_info.file_name, typeflag, &fun))
    {
      if (fun && (*fun) (current_stat_info.file_name, typeflag)
      && backup_option)
    undo_last_backup ();
    }
  else
    skip_member ();

Note that the real work happens when calling fun. fun is again a function pointer, which is set in prepare_to_extract. fun may point to extract_file, which does the actual writing.

I hope I walked you a great deal through this and shown you how I navigate through source code. Feel free to contact me if you have related questions.

answered Oct 05 '22 00:10

Sjoerd

The problem with programs like tar and sed is twofold (this is just my opinion, of course!). First of all, they're both really old. That means they've had multiple people maintain them over the years, with different coding styles and different personalities. For GNU utilities, it's usually pretty good, because they usually enforce a reasonably consistent coding style, but it's still an issue. The other problem is that they're unbelievably portable. Usually "portability" is seen as a good thing, but when taken to extremes, it means your codebase ends up full of little hacks and tricks to work around obscure bugs and corner cases in particular pieces of hardware and systems. And for programs as widely ported as tar and sed, that means there's a lot of corner cases and obscure hardware/compilers/OSes to take into account.

If you want to learn C, then I would say the best place to start is not trying to study code that others have written. Rather, try to write code yourself. If you really want to start with an existing codebase, choose one that's being actively maintained where you can see the changes that other people are making as they make them, follow along in the discussions on the mailing lists and so on.

With well-established programs like tar and sed, you see the result of the discussions that would've happened, but you can't see how software design decisions and changes are being made in real-time. That can only happen with actively-maintained software.

That's just my opinion of course, and you can take it with a grain of salt if you like :)

answered Oct 05 '22 01:10

Dean Harding

Related questions
                            
                                How do you print a limited number of characters?
                            
                                Simple C image library? [closed]
                            
                                check if carry flag is set
                            
                                What does "#define STR(a) #a" do?
                            
                                Why does this makefile execute a target on 'make clean'
                            
                                How to achieve lock-free, but blocking behavior?
                            
                                How to use a C++ member function as the callback function for a C framework
                            
                                defining and iterating through array of strings in c
                            
                                TCP Connect error 115 Operation in Progress What is the Cause?
                            
                                Warning: case not evaluated in enumerated type?
                            
                                Array format for #define (C preprocessor)
                            
                                sublime text 2 build system for C programming language [closed]
                            
                                Why is discarding the volatile qualifier in a function call a warning?
                            
                                Does "int (*)[]" decay into "int **" in a function parameter?
                            
                                Fast CRC algorithm?
                            
                                Standard function to replace character or substring in a char array? [duplicate]
                            
                                What is the point of saying "#define FOO FOO" in C?
                            
                                Why does an 8-bit field have endianness?
                            
                                How to write my own printf() in C?
                            
                                How do I output my host’s IP addresses from a C program?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With