Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

double string conversion and locale

A common international issue is the conversion of double values represented in strings. This stuff is found in a lot of areas.

Starting with csv files which are either called

comma separated

or

character separated

because sometimes they are stored like

1.2,3.4
5.6,6.4

in English regions or

1,2;3,4
5,6;6,4

in for example German regions.

From this background, it is somehow necessary to know that most of the std:: methods are locale dependent. So in Germany, they will read "1,2" as 1.2 and write it back as "1,2" but with an English OS it will read "1,2" as 1 and write it back as "1".

Because the locale is a global state of the application, it is not a good idea to switch it to a different setting; and here we are with some problems when I have to read a German CSV file on an English machine or vice versa.

It's also hard to write code that behaves the same on all machines. The C++ stream allows a locale setting per stream.

class Punctation : public numpunct<wchar_t>
{
public:

  typedef wchar_t char_type;
  typedef std::wstring string_type;

  explicit Punctation(const wchar_t& decimalPoint, std::size_t r = 0) : 
    decimalPoint_(decimalPoint), numpunct<wchar_t>(r)
  {
  }

  Punctation(const Punctation& rhs) : 
    decimalPoint_(rhs.decimalPoint_) 
  {
  }

protected:

  virtual ~Punctation() 
  {
  };

  virtual wchar_t do_decimal_point() const 
  { 
    return decimalPoint_; 
  }

private:

  Punctation& operator=(const Punctation& rhs);

  const wchar_t decimalPoint_;
};

...

std::locale newloc(std::locale::classic(), new Punctation(L','));
stream.imbue(newloc);

will allow you to initialize a stream with std:: C behavior and only replace the decimal point. This gives me the ability to ignore the thousand separator, which may come into affect too. German 1000.12 may become "1.000,12"; or in English "1,000.12" will end up in complete confusion. Even replacing "," by "." will not help in this situation.

If I have to work with atof and friends I can use

const char decimal_point = *(localeconv()->decimal_point);

to pimp my behavior.

So there is an awful amount of stuff just for international double behavior. Even my Visual Studio runs into problems because the German version wants to write 8,0 as version into the vcproj file while an English version wants to change it to 8.0, which definitively happened by incident because in XML it is defined to be 8.0 in all countries of the world.

So I just wanted to describe the problem a bit to ask for aspects I may have ignored. Things that I know:

  • decimal pint is locale dependent
  • thousand separator is locale dependent
  • exponent is locale dependent

//                  German       English     Also known
// decimal point       ,            .            
// exponent            e/E          e/E          d/D
// thousand sep        .            ,

Which country uses which setting? Maybe you can add me some interesting examples that I didn't have till now.

like image 313
Totonga Avatar asked Jun 17 '09 07:06

Totonga


1 Answers

Don't ever use atof( s ). It's a quick & dirty shortcut for strtod( s, 0 ) without the error reporting. (Same for atoi() and strtol().)

If a function be advertised to return an error code in the event of difficulties, thou shalt check for that code, yea, even though the checks triple the size of thy code and produce aches in thy typing fingers, for if thou thinkest ’it cannot happen to me’, the gods shall surely punish thee for thy arrogance.

(Henry Spencer, "Ten Commandments for the C Programmer", Commandment #6)

like image 130
DevSolar Avatar answered Sep 26 '22 17:09

DevSolar