Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I parse a string to a float in C in a way that isn't affected by the current locale?

I'm writing a program where I need to parse some configuration files in addition to user input from a graphical user interface. In particular, I'm having issues with parsing strings taken from the configuration file into floats as the function I've been using for this purpose so far, strtof(), respects the current locale which means a string that represents a floating point number may parse into 0.10000000149011612 in one locale and 0 in another—not good. This is because some locales use the full stop (.) for denoting the decimal separator whereas others use a comma (,), but the strings from the configuration file always use a full stop.

These configuration files are distributed to users in identical format regardless of their locale, and it is not feasible to distribute different versions dependent on the locale they have set—especially as they are a global immutable resource part of the operating system base and a system may have multiple users that aren't necessarily using the same locale.

I can't just set the locale to something predictable at program startup because removing support for i18n is a non-starter. I also want to preserve locale-specific parsing for user input as referenced earlier. I also don't think I safely can call setlocale(LC_ALL, "C") when I start parsing and then finish with setlocale(LC_ALL, "whatever it was before") as this is a multi-threaded program and I can't guarantee that other threads aren't doing locale-dependent work while configuration file parsing is happening.

So, how can I parse strings into floats in a locale-independent fashion in C, preferably without relying on functionality outside of the standard library? The program I'm writing only targets Linux (although it may also be possible to run it on BSDs, but they are not a priority), so Linux-specific answers are just fine.

like image 934
Newbyte Avatar asked Sep 13 '25 15:09

Newbyte


1 Answers

It is indeed unfortunate that the C Standard does not provide functions to handle these conversions for a specified locale.

There is no simple portable solution to this problem using standard functions. Converting the strings from the config file to the locale specific alternative is feasible but tricky.

There is a simple work around for the config file. Use the exponent notation without decimals: 123e-3 is portable locale neutral version of 0.123 or 0,123.

POSIX has alternate functions for most standard functions with locale specific behavior, but unfortunately not for strtod() and friends.

Yet both the GNU libc on linux (and alternate libraries such as musl) and the BSD systems support extended POSIX locale functions:

#define _GNU_SOURCE   // for linux
#include <stdlib.h>
#ifdef __APPLE__
#include <xlocale.h>  // on macOS
#endif

double strtod_l(const char * restrict nptr, char ** restrict endptr,
                locale_t loc);

float strtof_l(const char * restrict nptr, char ** restrict endptr,
               locale_t loc);

long double strtold_l(const char * restrict nptr, char ** restrict endptr,
                      locale_t loc);

On macos, it seems you can pass 0 for the loc argument and get the C locale, on linux loc is specified in the header file as non null so you need to create a C locale with newlocale.

Here is an example:

#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <locale.h>
#ifdef __APPLE__
#include <xlocale.h>
#endif

locale_t c_locale;

int main(void) {
    const char locale_name[] = "fr_FR.UTF-8";
    const char locale_string[] = "0,123";
    const char standard_string[] = "0.123";

    c_locale = newlocale(LC_ALL_MASK, "C", (locale_t)0);

    setlocale(LC_ALL, locale_name);

    double x1, x2, y1, y2;
    x1 = strtod(locale_string, NULL);
    x2 = strtod_l(standard_string, NULL, c_locale);
    int s1 = sscanf(locale_string, "%lf", &y1);
    int s2 = sscanf_l(standard_string, c_locale, "%lf", &y2);

    printf("default locale: %s\n\n", locale_name);
    printf("using printf(...):\n");
    printf("  strtod(\"%s\", NULL) -> %f\n", locale_string, x1);
    printf("  strtod_l(\"%s\", NULL, c_locale) -> %f\n", standard_string, x2);
    printf("  sscanf(\"%s\", &y1) -> %d,  y1=%f\n", locale_string, s1, y1);
    printf("  sscanf_l(\"%s\", c_locale, &y2) -> %d, y2=%f\n", standard_string, s2, y2);

    printf("\nusing printf_l(c_locale, ...):\n");
    printf_l(c_locale, "  strtod(\"%s\", NULL) -> %f\n", locale_string, x1);
    printf_l(c_locale, "  strtod_l(\"%s\", NULL, c_locale) -> %f\n", standard_string, x2);
    printf_l(c_locale, "  sscanf(\"%s\", &y1) -> %d,  y1=%f\n", locale_string, s1, y1);
    printf_l(c_locale, "  sscanf_l(\"%s\", c_locale, &y2) -> %d, y2=%f\n", standard_string, s2, y2);

    return 0;
}

Output:

default locale: fr_FR.UTF-8

using printf(...):
  strtod("0,123", NULL) -> 0,123000
  strtod_l("0.123", NULL, c_locale) -> 0,123000
  sscanf("0,123", &y1) -> 1,  y1=0,123000
  sscanf_l("0.123", c_locale, &y2) -> 1, y2=0,123000

using printf_l(c_locale, ...):
  strtod("0,123", NULL) -> 0.123000
  strtod_l("0.123", NULL, c_locale) -> 0.123000
  sscanf("0,123", &y1) -> 1,  y1=0.123000
  sscanf_l("0.123", c_locale, &y2) -> 1, y2=0.123000

If strtod_l is not available, copying the string and substituting the decimal separator will be required, but here is a list of caveats:

  • the source string will be copied to a temporary buffer of sufficient length, at least 300 bytes, possibly more needed.
  • the source array can contain arbitrary spacing before the number and arbitrary text after the number and might not be null terminated.
  • the endptr must be computed to point to the source string, if the prefix was not copied, an adjustment is necessary
  • swapping . for , is incorrect: making assumptions regarding the current decimal separator is risky, the appropriate one for the currently selected locale must be retrieved via localeconv(), it is the string pointed to by the decimal_point member of the struct lconv. If this string has more than one character, updating the end pointer is tricky.
  • if the current locale is changed concurrently in another thread, the behavior is undefined.

A simpler and safer solution is to read the settings at the beginning of the process, before changing the locale. The process starts in the "C" locale. The problem remains if the settings must be updated as snprintf() will use the current locale too.

like image 145
chqrlie Avatar answered Sep 15 '25 04:09

chqrlie