Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linux shell: Adding dots to numerical outputs to make them more readable

This is a common output for some numeric program like Word-Count (wc):

$ wc MyLongFile.txt -l
985734902867 MyLongFile.txt

I was wondering about some way to filter the numeric part in order to become something a lot more readable like:

985.734.902.867 MyLongFile.txt

Many programs have some -h (for human readable) option, but it would be fine to know about some generic method that could be implemented in a function or alias... or at least typed in, if it is not very long.

I suppose the method would require to add a . each 3 numbers concatenated group, but starting from the right side.

Methods not changing the non-numeric parts are preferred. If possible, consider the possibility to have letters (or any other character) on the left side of the numbers to, like in:

ls -la
-rw-rw-r-- 1 luis luis  93342519 ene  1 00:22 tmp.txt

The best I have found until now is this sed command:

$ wc MyLongFile.txt -l | sed 's/\(^\|[^0-9.]\)\([0-9]\+\)\([0-9]\{3\}\)/\1\2.\3/g'
985734902,867 MyLongFile.txt

...but, as you can see, it only work until thousands, and I am not very experienced on sed.

Thanks you a lot.

like image 683
Sopalajo de Arrierez Avatar asked Sep 30 '22 02:09

Sopalajo de Arrierez


1 Answers

You could do this through Perl which uses a positive lookahead based regex.

perl -pe 's/(\d{1,3})(?=(?:\d{3}){1,5}\b)/\1,/g' file

OR

wc MyLongFile.txt -l | perl -pe 's/(\d{1,3})(?=(?:\d{3}){1,5}\b)/\1,/g'

Example:

$ cat file
7985734902867 MyLongFile.txt
734902867 MyLongFile1.txt
$ perl -pe 's/(\d{1,3})(?=(?:\d{3}){1,5}\b)/\1,/g' file
7,985,734,902,867 MyLongFile.txt
734,902,867 MyLongFile1.txt

It's like a regex multiplication. Let me explain how it works. Consider this 7985734902867 MyLongFile.txt as an example.

  1. \d{1,3} Matches a single digit or two or three. If it's wrapped inside a capturing group, the corresponding digits not only gt matched but also it would be captured.

  2. At first the regex engine would match the first digit 7 and checks whether the digit 7 is followed by the digits which are the multiples of 3. So the digit 7 is followed by 12 digits again followed by a word boundary. 12 is a multiple of 3. So it captures the corresponding digit 7. Here the word boundary \b is a must needed one which matches between a word character and a non-word character.

  3. Next it checks the next digit 9, which is followed by 11 digits. So it won't capture only the digit 9. Because we defined \d{1,3} , it takes two digits that is 98 and checks for the following digits. Because it is followed by 10 digits which aren't a multiple of 3. So it captures next digit also ie, 5 and then checks for the following digits. Now it is followed by a 9 digit number. So the corresponding three digits 985 would be captured. Likewise it goes upto the number which is followed by exactly three digits and a word boundary.

  4. Replacing all the matched chars with \1 ie, chars inside group index 1 plus a comma will give you the desired output.

  5. You could increase the range count inside the positive lookahead for large numbers, like (?=(?:\d{3}){1,10}\b

like image 167
Avinash Raj Avatar answered Oct 02 '22 14:10

Avinash Raj