Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the purpose of the new C23 #embed directive?

A new preprocessor directive is available in the upcoming C23 Standard: #embed

Here is a simple example:

// Placing a small image resource.

#include <stddef.h>

void show_icon(const unsigned char *, size_t);

int main (int, char*[]) {
    static const unsigned char icon_data[] = {
#embed "black_sheep.ico"
    };
    show_icon(icon_data, sizeof(icon_data));
    return 0;
}

Here is a more elaborate one, initializing non arrays from binary data (whatever that means):

int main() {
    /* Braces may be kept or elided as per normal initialization rules */
    int i = {
#embed "i.dat"
    }; /* i value is [0, 2^(embed element width)) first entry */
    int i2 =
#embed "i.dat"
    ; /* valid if i.dat produces 1 value, i2 value is [0, 2^(embed element width)) */
    struct s {
        double a, b, c;
        struct { double e, f, g; };
        double h, i, j;
    };
    struct s x = {
        /* initializes each element in order according to 
           initialization rules with comma-separated list
           of integer constant expressions inside of braces
         */
#embed "s.dat"
   };
   return 0;
}

What is the purpose of adding this to the C language?

like image 655
chqrlie Avatar asked Sep 11 '25 10:09

chqrlie


1 Answers

#embed allows easy inclusion of text or binary data in a program executable image, as arrays of char, unsigned char or other types, without the need for an external script run from a Makefile.

Here are simple examples:

const char source_code[] = {
    #embed "my_source_file.c"
    , 0 // add a null terminator
};

const unsigned char binary_data[] = {
    #embed "big_data_blob.bin"
};

Embedding binary or even textual data offers benefits over reading from files at load time:

  • there might not be a file system
  • the path to the files might be non obvious
  • the files could be missing or inaccessible

Embedding file contents produced by external scripts is not always simple:

  • running external scripts requires build system support and is mostly non portable
  • the files could be large and many compilers are inefficient at parsing such large arrays produced by an external script, with a notable exception: tcc.
  • some compilers, notably Microsoft Visual C, have limitations on string literal size that prevent inclusion of even moderately large blobs from external scripts as string literals.

This feature was added to help programmers deal with this issue and easily embed external data.

Beyond the simple examples above, the specification for #embed has extended features to perform some filtering on the external file, which can be hard to master:

const unsigned char null_terminated_file_data[] = {
    #embed "might_be_empty.txt" \
        prefix(0xEF, 0xBB, 0xBF, ) /* UTF-8 BOM */ \
        suffix(,)
    0 // always null-terminated
};

Or worse:

int main () {
#define SOME_CONSTANT 0
    return
#embed </dev/urandom> if_empty(0) limit(SOME_CONSTANT)
    ;
}

The C++ Committee was strongly in favor of vendor extensions in the parameter specification whereas the C Committee was less thrilled by the parameter specification itself.

Read the details in: https://thephd.dev/_vendor/future_cxx/papers/C%20-%20embed.html

This paper enumerates interesting examples where #embed may come in handy, but a more general solution seems possible.

like image 152
chqrlie Avatar answered Sep 14 '25 01:09

chqrlie