Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the theory and usage behind self including source file in C and C++?

Please refer this FASTLZ.C source code.

  • At Line #113 and #128 it's including its own source file.

I think it's intention was two defined following functions names with respect to their FASTLZ_LEVEL macro value.

#define FASTLZ_COMPRESSOR fastlz1_compress
#define FASTLZ_DECOMPRESSOR fastlz1_decompress
static FASTLZ_INLINE int FASTLZ_COMPRESSOR(const void* input, int length, void* output);
static FASTLZ_INLINE int FASTLZ_DECOMPRESSOR(const void* input, int length, void* output, int maxout);
#include "fastlz.c"

and

#define FASTLZ_COMPRESSOR fastlz2_compress
#define FASTLZ_DECOMPRESSOR fastlz2_decompress
static FASTLZ_INLINE int FASTLZ_COMPRESSOR(const void* input, int length, void* output);
static FASTLZ_INLINE int FASTLZ_DECOMPRESSOR(const void* input, int length, void* output, int maxout);
#include "fastlz.c"

But I cannot figure it out the theory or key feature behind this Macro in C language, please can someone briefly explain this scenario?

like image 356
Buddhika Chaturanga Avatar asked Jan 18 '18 13:01

Buddhika Chaturanga


2 Answers

This defines two pairs of functions called fastlz1_compress and fastlz1_decompress, and fastlz2_compress and fastlz2_decompress. The two compress functions are very similar except for a few lines here and there, and similarly for the decompress functions. The self inclusion, which happens twice, is done to remove repetition in the definitions of these two pairs of functions.

Here's an abbreviated version of what the file contains:

#if !defined(FASTLZ__COMPRESSOR) && !defined(FASTLZ_DECOMPRESSOR)

...
#undef FASTLZ_LEVEL
#define FASTLZ_LEVEL 1

#undef FASTLZ_COMPRESSOR
#undef FASTLZ_DECOMPRESSOR
#define FASTLZ_COMPRESSOR fastlz1_compress
#define FASTLZ_DECOMPRESSOR fastlz1_decompress
static FASTLZ_INLINE int FASTLZ_COMPRESSOR(const void* input, int length, void* output);
static FASTLZ_INLINE int FASTLZ_DECOMPRESSOR(const void* input, int length, void* output, int maxout);
#include "fastlz.c"

#undef FASTLZ_LEVEL
#define FASTLZ_LEVEL 2

#undef MAX_DISTANCE
#define MAX_DISTANCE 8191
#define MAX_FARDISTANCE (65535+MAX_DISTANCE-1)

#undef FASTLZ_COMPRESSOR
#undef FASTLZ_DECOMPRESSOR
#define FASTLZ_COMPRESSOR fastlz2_compress
#define FASTLZ_DECOMPRESSOR fastlz2_decompress
static FASTLZ_INLINE int FASTLZ_COMPRESSOR(const void* input, int length, void* output);
static FASTLZ_INLINE int FASTLZ_DECOMPRESSOR(const void* input, int length, void* output, int maxout);
#include "fastlz.c"

...

#else /* !defined(FASTLZ_COMPRESSOR) && !defined(FASTLZ_DECOMPRESSOR) */

static FASTLZ_INLINE int FASTLZ_COMPRESSOR(const void* input, int length, void* output)
{
   ...
   #if FASTLZ_LEVEL==2
   ...
   #endif
   ...
   #if FASTLZ_LEVEL==1
   ...
   #else
   ...
   #endif
   ...
}

static FASTLZ_INLINE int FASTLZ_DECOMPRESSOR(const void* input, int length, void* output, int maxout)
{
   ...
   #if FASTLZ_LEVEL==2
   ...
   #endif
   ...
   #if FASTLZ_LEVEL==1
   ...
   #else
   ...
   #endif
   ...    
}

#endif

The first part of the file containing the #if block contains a series of macro definitions, but you'll notice that they're defined twice. The second part of the file containing the #else block basically contains a pair of function templates.

The first part defines some macros, then includes itself. On the self-inclusion, the #else part takes effect. This defines fastlz1_compress and fastlz1_decompress based on the FASTLZ_COMPRESSOR and FASTLZ_DECOMPRESSOR macros. Because FASTLZ_LEVEL is set to 1, this activates the fastlz1_compress and fastlz1_decompress specific code.

After the first self-include, these macros are undefined and then redefined for fastlz2_compress and fastlz2_decompress, then the file is self-included again. So the #else part is pulled in again, but this time the effect is fastlz2_compress and fastlz2_decompress are defined, and the code specific to these functions is activated by virtue of FASTLZ_LEVEL now being set to 2.

A slightly less confusing way to do this would have been to put everything between the outer #if and #else in one file and the part between #else and #endif in another file.

A better way would have been to create a single compress function and a single decompress function, with each accepting a parameter to specify the level rather than using macro trickery. For example:

static FASTLZ_INLINE int fastlz_compress(const void* input, int length, void* output, int level)
{
   ...
   if (level==2) {
   ...
   }
   ...
   if (level==1) {
   ...
   } else {
   ...
   }
   ...
}
like image 143
dbush Avatar answered Nov 15 '22 02:11

dbush


Used properly, this can be a useful technique.

Say you have a complex, performance critical subsystem with a fairly small public interface and a lot of non-reusable implementation code. The code runs to several thousand lines, a hundred or so private functions and quite a bit of private data. If you work with non-trivial embedded systems, you probably deal with this situation frequently enough.

Your solution will probably be layered, modular and decoupled and these aspects can be usefully represented and reinforced by coding different parts of the subsystem in different files.

With C, you can lose a lot by doing this. Almost all toolchains provide decent optimisation for a single compilation unit, but are very pessimistic about anything declared extern.

If you put everything into one C source module, you get -

  • Performance & code size improvements - function calls will be inlined in many cases. Even without inlining, the compiler has opportunities to produce more efficient code.
  • Link level data & function hiding.
  • Avoidance of namespace pollution and its corollary - you can use less unwieldy names.
  • Faster compilation & linkage.

But you also get an unholy mess when it comes to editing this file and you lose the implied modularity. This can be overcome by splitting the source into several files and including these to produce a single compilation unit.

You need to impose some conventions to manage this properly though. These will depend on your toolchain to some extent, but some general pointers are -

  • Put the public interface in a separate header file - you should be doing this anyway.
  • Have one main .c file that includes all the subsidiary .c files. This could also include the code for the public interface.

  • Use compiler guards to ensure that private headers and source modules are not included by external compilation units.

  • All private data & functions should be declared static.

  • Maintain the conceptual distinction between .c and .h files. This leverages existing conventions. The difference is that you will have a lot of static declarations in your headers.

  • If your toolchain doesn't impose any reason not to, name the private implementation files as .c and .h. If you use include guards, these will produce no code and introduce no new names (you may end up with some empty segments during linkage). The huge advantage is that other tools (e.g. IDEs) will treat these files appropriately.

like image 26
Pethead Avatar answered Nov 15 '22 04:11

Pethead