Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

performance overhead of the gettext internationalization system in C/C++

I just worked through the documentation of http://www.gnu.org/software/gettext/manual/gettext.html and there is no discussion at all about the performance overhead. On the internet, I found only performance discussions for other languages (PHP and Java) but nothing for C/C++.

Therefore my questions:

  1. What is the performance overhead during startup of a program that uses gettext (load shared library? How are the translations loaded into memory? Are all translations loaded on startup or on-demand?)

  2. What is the performance penalty during normal operation of the program? (i.e. when a translation is needed) How large is the increased memory footprint of the program and how is the memory organized? Is there a higher danger/possibility that parts of the program are swapped to disk when the program is idle? (If the translations are stored in a very different part of the memory than the rest of the program, then in my understanding the chance of a page fault is higher than compared to an un-internationalized version of the program)

  3. Does a program that runs under the "C"-locale also suffer these performance penalties?

Thanks a lot.

like image 318
Robby75 Avatar asked Aug 16 '13 08:08

Robby75


2 Answers

Given that the alternative to this approach is to have a large number of builds, each with something like this in it:

int main()
{
    printf(
#ifdef SWEDISH
           "Hej världen\n"
#elsif ENGLISH
           "Hello, World\n"
#elsif PORTUGUESE
           "Olá, Mundo\n"
#else  
   #error Language not specified. 
#endif
    );
    return 0l;
}

instead we get:

int main()
{
   printf(gettext("Hello, World\n")); 
}

which is easy to read and understand.

I don't know the exact structure of the gettext implementation, but I would expect that it is a hash-table once it's loaded. Possibly a binary tree, but hash-table seems more sensible.

As to the exact overheads, it's very hard to put a number on it - especially, as you say, if something is swapped to disk, and the disk has stopped, it takes 3-4 seconds to get the disk up to speed. So how do you quantify that? Yes, it's possible that the page needed for gettext is swapped out if the system has been busy doing something memory intensive.

Loading the message file should only be a large overhead if the file is very large, but again, if the disk is not spinning, and the file is not cached, then there will be an overhead of several seconds. Again, how to quantify that. The size of the file is clearly directly proportional to the actual size of the translated (or native language) messages.

Regarding point 2:

As far as I know, in both Linux and Windows, pages are swapped out on a "least recently used" (or some other usage statistical) basis, which has nothing to do with where they are located. Clearly the translated messages are in a different place than the actual code - there isn't a list of 15 different translations in the source file, so the translations are loaded at runtime, and will be located in a different place than the code itself. However, the overhead of this is similar to the overhead difference between:

static const char *msg = "Hello, World\n";

and

static const char *msg = strdup("Hello, World\n"); 

Given that text-strings are generally kept together in the binary of a program anyway, I don't think their "nearness" to the executing code is significantly different from a dynamically allocated piece of memory somewhere in the heap. If you call the gettext function often enough, that memory will be kept "current" and not swapped out. If you don't call gettext for some time, it may get swapped out. But that applies to "none of the strings stored in the executable have been used recently, so they got swapped out".

3) I think English (or "no language selected") is treated exactly identical to any other language variant.

I will have a little further dig in a bit, need breakfast first...

Very unscientific:

#include <libintl.h>
#include <cstdio>
#include <cstring>

static __inline__ unsigned long long rdtsc(void)
{
    unsigned hi, lo;
    __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
    return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}


int main()
{
    char str[10000] = {};
    char *s = str;
    unsigned long long time;

    for(int i = 0; i < 10; i++)
    {
    time = rdtsc();
    s += sprintf(s, "Hello, World %d", i);
    time = rdtsc() - time;
    printf("Time =%lld\n", time);
    }
    printf("s = %s\n", str);
    s = str;

    strcpy(s, "");
    for(int i = 0; i < 10; i++)
    {
    time = rdtsc();
    s += sprintf(s, gettext("Hello, World %d"), i);
    time = rdtsc() - time;
    printf("Time =%lld\n", time);
    }
    printf("s = %s\n", str);
}

Gives the following results:

$ g++ -Wall -O2 intl.cpp
$ ./a.out
Time =138647
Time =9528
Time =6710
Time =5537
Time =5785
Time =5427
Time =5406
Time =5453
Time =5644
Time =5431
s = Hello, World 0Hello, World 1Hello, World 2Hello, World 3Hello, World 4Hello, World 5Hello, World 6Hello, World 7Hello, World 8Hello, World 9
Time =85965
Time =11929
Time =10123
Time =10226
Time =10628
Time =9613
Time =9515
Time =9336
Time =9440
Time =9095
s = Hello, World 0Hello, World 1Hello, World 2Hello, World 3Hello, World 4Hello, World 5Hello, World 6Hello, World 7Hello, World 8Hello, World 9

The code in dcigettext.c uses a mixture of binary search in a flat array of strings, and a hash function that hashes the string to a PJW hash (see : http://www.cs.hmc.edu/~geoff/classes/hmc.cs070.200101/homework10/hashfuncs.html ).

So, the overhead, once the application has started, appears to be around "just noticeable" (when counting clockcycles), but not enormous.

The exact time it takes to run the first sprintf is somewhat varying in both cases, so I wouldn't say that "using gettext" makes sprintf faster on the first call - just "bad luck" on this run (I had a few other variants of the code, and they all vary greatly on the first call to sprintf, and less for later calls). Probably some setup (possibly caches [printf causing caches to be overwritten with other garbage is quite likely], branch prediction, etc) somewhere that takes extra time...

Now, this clearly doesn't answer your questions on paging out, etc. And I didn't try to make a Swedish, Portuguese or German translation of my "Hello, World" message. I still believe that it's not huge, unless you are indeed running 100s of instantiations of an application per second, and that application doesn't do much other than print a message to the screen after doing some simple calculations, sure, it could be important.

The only REAL way to find out how much difference it makes is to compile the same applicaion with #define _(x) x instead of #define _(x) gettext(x), and see if you notice any difference.

I still think the "paged out" is a red herring. If the machine is under HIGH memory pressure, then it will be running slow no matter what (If I write a piece of code that allocates 16GB [I have 16GB RAM in the machine] on my machine, just about everything except the keyboard itself (can blink the num-lock LED) and the mouse pointer itself (can move mouse pointer around on screen) goes unresponsive).

like image 124
Mats Petersson Avatar answered Nov 09 '22 22:11

Mats Petersson


Some measurements:

    for ( ; n > 0; n--) {
#ifdef I18N
            fputs(gettext("Greetings!"), stdout);
#else
            fputs("Greetings!", stdout);
#endif
            putc('\n', stdout);
    }

With n = 10000000 (10 Million), and redirecting output to a file. No po file for the locale, so the original string is printed (identical output file). User time in seconds:

  • 0.23 with I18N undefined
  • 4.43 with I18N
  • 2.33 with I18N and LC_ALL=C

Overhead of 0.4 microseconds per call. (On a Phenom X6 @3.6GHz, Fedora 19). With LC_ALL=C the overhead is only 0.2 µs. Note that this is probably the worst case - usually you'll do something more in you program. Still, it's a factor of 20, and that includes IO. gettext() is rather slower than I would have expected.

Memory use I have not measured, as it probably depends on the size of the po file. Startup time I have no idea how to measure.

like image 21
Chris Avatar answered Nov 09 '22 22:11

Chris