Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Locale-invariant string processing with strtod strtof atof printf?

Are there any plans for adding versions of C standard library string processing functions that are invariant under current locale?

Currently there are lots of fragile workarounds, for example, from jansson/strconv.c:

static void to_locale(strbuffer_t *strbuffer)
{
    const char *point;
    char *pos;

    point = localeconv()->decimal_point;
    if(*point == '.') {
        /* No conversion needed */
        return;
    }

    pos = strchr(strbuffer->value, '.');
    if(pos)
        *pos = *point;
}

static void from_locale(char *buffer)
{
    const char *point;
    char *pos;

    point = localeconv()->decimal_point;
    if(*point == '.') {
        /* No conversion needed */
        return;
    }

    pos = strchr(buffer, *point);
    if(pos)
        *pos = '.';
}

These functions preprocess its input so it can be used independent of the current locale, under the assumption

  1. That the delimiter is one byte
  2. No call to setlocale happens between these fix function and the call to any of the affected functions
  3. The string can be modified before conversion

(1) implies that the preprocessing approach breaks on exotic locales (see https://en.wikipedia.org/wiki/Decimal_mark#Hindu.E2.80.93Arabic_numeral_system for examples). (2) implies that the preprocessing approach cannot be threadsafe without a lock, and that lock must be added to the C library. (3) Just stupid.

If it were only possible to specify the locale for a single call to a string-processing function as a parameter, not affecting any other threads, none of these restrictions would apply.

Questions:

  1. Are there any reports to WG14, or WG21 that address this defect?
  2. If so, why hasn't these been merged into the standard? It would be nothing more than a new set of functions that take a locale as argument.
  3. What is the canonical workaround?

Update:

After searching through the Internet, I found the *_l functions, available on FreeBSD, GNU/Linux and MacOSX. Similar functions exists on Windows also. These solve my problem, however these are not in POSIX, which is a superset of C (not really, POSIX relaxes on pointers). So questions 1, and 2 remains open.

like image 328
user877329 Avatar asked Jan 22 '17 18:01

user877329


Video Answer


2 Answers

BSD and macOS Sierra (and Mac OS X before it) support _l functions that allow you to specify the locale, rather than relying on the current locale. For example:

int
fprintf_l(FILE * restrict stream, locale_t loc, const char * restrict format, ...);

int
printf_l(locale_t loc, const char * restrict format, ...);

int
snprintf_l(char * restrict str, size_t size, locale_t loc, const char * restrict format, ...);

int
sprintf_l(char * restrict str, locale_t loc, const char * restrict format, ...);

and:

int
fscanf_l(FILE * restrict stream, locale_t loc, const char * restrict format, ...);

int
scanf_l(locale_t loc, const char * restrict format, ...);

int
sscanf_l(const char * restrict str, locale_t loc, const char * restrict format, ...);

As a general design, this seems sensible. The type locale_t is not part of Standard C but is part of POSIX (and defined in <locale.h> there), and used in <ctype.h> amongst other places. The BSD man pages say that the header to use is <xlocale.h> rather than <locale.h>; this would perhaps be fixed by the standard. Unless there is a major flaw in the design of the BSD functions, these should be a very good basis for any standardization effort, whether that was under POSIX or Standard C.

One issue with the BSD design might be that the locale_t structure is passed by value, not by (constant restricted) pointer, which is a little surprising. However, it is consistent with the POSIX functions such as:

int   isalpha_l(int, locale_t);

A similar scheme might be devised for handling time zone settings, too. There'd be more work in setting that up since there isn't already a time zone type (whereas the locale_t is part of POSIX already — and could probably be adopted without change into standard C). But, combined with locale settings, it could make the time routines more easily usable in diverse environments from a single executable.

like image 173
Jonathan Leffler Avatar answered Sep 21 '22 18:09

Jonathan Leffler


sqlite has locale independant printf implementation which is good for your sort of thing as it makes doubles compatible with sql syntax rules. If you can include sqlite as a dependency then that might be a viable option.

like image 37
over_optimistic Avatar answered Sep 19 '22 18:09

over_optimistic