Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Number formatting in BASH with thousand separator

Tags:

I have a number 12343423455.23353. I want to format the number with thousand separator. So th output would be 12,343,423,455.23353

like image 631
Shiplu Mokaddim Avatar asked Feb 21 '12 09:02

Shiplu Mokaddim


People also ask

What is the thousand separator format?

The character used as the thousands separatorIn the United States, this character is a comma (,). In Germany, it is a period (.). Thus one thousand and twenty-five is displayed as 1,025 in the United States and 1.025 in Germany. In Sweden, the thousands separator is a space.

What do you call the comma with thousands separator?

The decimal separator is also called the radix character. Likewise, while the U.K. and U.S. use a comma to separate groups of thousands, many other countries use a period instead, and some countries separate thousands groups with a thin space.

What are number separators?

A decimal separator is a symbol used to separate the integer part from the fractional part of a number written in decimal form (e.g., "." in 12.45). Different countries officially designate different symbols for use as the separator.


2 Answers

$ printf "%'.3f\n" 12345678.901 12,345,678.901 
like image 54
Ignacio Vazquez-Abrams Avatar answered Sep 22 '22 16:09

Ignacio Vazquez-Abrams


tl;dr

  • Use numfmt, if GNU utilities are available, such as on Linux by default:

    • numfmt --grouping 12343423455.23353 # -> 12,343,423,455.23353 in locale en_US
  • Otherwise, use printf with the ' field flag wrapped in a shell function that preserves the number of input decimal places (does not hard-code the number of output decimal places).

    • groupDigits 12343423455.23353 # -> 12,343,423,455.23353 in locale en_US
    • See the bottom of this answer for the definition of groupDigits(), which also supports multiple input numbers.
  • Ad-hoc alternatives involving subshells that also preserve the number of input decimal places (assumes that the input decimal mark is either . or ,):

    • A modular, but somewhat inefficient variant that accepts the input number via stdin (and can therefore also be used with pipeline input):
      (n=$(</dev/stdin); f=${n#*[.,]}; printf "%'.${#f}f\n" "$n") <<<12343423455.23353
    • Significantly faster, but less modular alternative that uses intermediate variable $n: n=12343423455.23353; (f=${n#*[.,]} printf "%'.${#f}f\n" "$n")
  • Alternatively, consider use of my Linux/macOS grp CLI (installable with npm install -g grp-cli):

    • grp -n 12343423455.23353

In all cases there are caveats; see below.


Ignacio Vazquez-Abrams's answer contains the crucial pointer for use with printf: the ' field flag (following the %) formats a number with the active locale's thousand separator:

  • Note that man printf (man 1 printf) does not contain this information itself: the utility / shell builtin printf ultimately calls the library function printf(), and only man 3 printf gives the full picture with respect to supported formats.
  • Environment variables LC_NUMERIC and, indirectly, LANG or LC_ALL control the active locale with respect to number formatting.
  • Both numfmt and printf respect the active locale, both with respect to the thousands separator and the decimal mark ("decimal point").
  • Using just printf by itself, as in Ignacio's answer, requires that you hard-code the number of output decimal places, rather than preserving however many decimal places the input has; it is this limitation that groupDigits() below overcomes.
  • printf "%'.<numDecPlaces>f" does have one advantage over numfmt --grouping, however:
    • numfmt only accepts decimal numbers, whereas printf's %f also accepts hexadecimal integers (e.g., 0x3e8) and numbers in decimal scientific notation (e.g., 1e3).

Caveats

  • Locales without grouping: Some locales, notably C and POSIX, by definition do NOT apply grouping, so use of ' has no effect in that event.

  • Real-world locale inconsistencies across platforms:

    • (LC_ALL='de_DE.UTF-8'; printf "%'.1f\n" 1000) # SHOULD yield: 1.000,0
    • Linux: yields 1.000,0, as expected.
    • macOS/BSD: Unexpectedly yields 1000,0 - NO grouping(!).
  • Input number format: When you pass a number to numfmt or printf, it:
    • mustn't already contain digit grouping
    • must already use the active locale's decimal mark
    • For example:
      • (LC_ALL='lt_LT.UTF-8'; printf "%'.1f\n" 1000,1) # -> '1 000,1'
      • OK: input number is not grouped and uses Lithuanian decimal mark (comma).
  • Portability: POSIX doesn't require the printf utility (as opposed to the C printf() library function) to support floating-point format characters such as %f, given that POSIX[-like] shells are integer-only; in practice, however, I'm not aware of any shells/platforms that do not.

  • Rounding errors and overflow:

    • When using numfmt and printf as described, round-trip conversion occurs (string -> number -> string), which is subject to rounding errors; in other words: reformatting with digit grouping can lead to a different number.
    • Using format character f to employ IEEE-754 double-precision floating-point values, only up to 15 significant digits (digits irrespective of the location of the decimal mark) are guaranteed to be accurately preserved (though for specific numbers it may work with more digits). In practice, numfmt and GNU printf can accurately handle more than that; see below. If anyone knows how and why, let me know.
    • With too many significant digits or too-large a value present, the behavior differs between numfmt and printf in general, and between printf implementations across platforms; for example:

numft:

[Fixed in coreutils 8.24, according to @pixelbeat] Starting with 20 significant digits, the value overflows quietly(!) - presumably a bug (as of GNU coreutils 8.23):

# 20 significant digits cause quiet overflow: $ (fractPart=0000000000567890; num="1000.${fractPart}"; numfmt --grouping "$num") -92.23372036854775807    # QUIET OVERFLOW 

By contrast, a number that is too large does generate an error by default.

printf:

Linux printf handles up to 20 significant digits accurately, whereas the BSD/macOS implementation is limited to 17:

# Linux: 21 significant digits cause rounding error: $  (fractPart=00000000005678901; num="1000.${fractPart}"; printf "%'.${#fractPart}f\n" "$num") 1,000.00000000005678902  # ROUNDING ERROR  # BSD/macOS: 18 significant digits cause rounding error: $  (fractPart=00000000005678; num="1000.${fractPart}"; printf "%'.${#fractPart}f\n" "$num") 1,000.00000000005673  # ROUNDING ERROR 

The Linux version never seems to overflow, whereas the BSD/macOS version reports an error with numbers that are too large.


Bash shell function groupDigits():

# SYNOPSIS #   groupDigits num ... # DESCRIPTION #   Formats the specified number(s) according to the rules of the #   current locale in terms of digit grouping (thousands separators). #   Note that input numbers #     - must not already be digit-grouped themselves, #     - must use the *current* locale's decimal mark. #   Numbers can be integers or floats. #   Processing stops at the first number that can't be formatted, and a #   non-zero exit code is returned. # CAVEATS #   - No input validation is performed. #   - printf(1) is not guaranteed to support non-integer formats by POSIX, #     though not doing so is rare these days. #   - Round-trip number conversion is involved (string > double > string) #     so rounding errors can occur. # EXAMPLES #   groupDigits 1000 # -> '1,000' #   groupDigits 1000.5 # -> '1,000.5' #   (LC_ALL=lt_LT.UTF-8; groupDigits 1000,5) # -> '1 000,5' groupDigits() {   local decimalMark fractPart   decimalMark=$(printf "%.1f" 0); decimalMark=${decimalMark:1:1}   for num; do     fractPart=${num##*${decimalMark}}; [[ "$num" == "$fractPart" ]] && fractPart=''     printf "%'.${#fractPart}f\n" "$num" || return   done } 
like image 25
mklement0 Avatar answered Sep 19 '22 16:09

mklement0