This is a common output for some numeric program like Word-Count (wc
):
$ wc MyLongFile.txt -l
985734902867 MyLongFile.txt
I was wondering about some way to filter the numeric part in order to become something a lot more readable like:
985.734.902.867 MyLongFile.txt
Many programs have some -h
(for human readable
) option, but it would be fine to know about some generic method that could be implemented in a function or alias... or at least typed in, if it is not very long.
I suppose the method would require to add a .
each 3 numbers concatenated group, but starting from the right side.
Methods not changing the non-numeric parts are preferred. If possible, consider the possibility to have letters (or any other character) on the left side of the numbers to, like in:
ls -la
-rw-rw-r-- 1 luis luis 93342519 ene 1 00:22 tmp.txt
The best I have found until now is this sed
command:
$ wc MyLongFile.txt -l | sed 's/\(^\|[^0-9.]\)\([0-9]\+\)\([0-9]\{3\}\)/\1\2.\3/g'
985734902,867 MyLongFile.txt
...but, as you can see, it only work until thousands, and I am not very experienced on sed
.
Thanks you a lot.
You could do this through Perl which uses a positive lookahead based regex.
perl -pe 's/(\d{1,3})(?=(?:\d{3}){1,5}\b)/\1,/g' file
OR
wc MyLongFile.txt -l | perl -pe 's/(\d{1,3})(?=(?:\d{3}){1,5}\b)/\1,/g'
Example:
$ cat file
7985734902867 MyLongFile.txt
734902867 MyLongFile1.txt
$ perl -pe 's/(\d{1,3})(?=(?:\d{3}){1,5}\b)/\1,/g' file
7,985,734,902,867 MyLongFile.txt
734,902,867 MyLongFile1.txt
It's like a regex multiplication. Let me explain how it works. Consider this 7985734902867 MyLongFile.txt
as an example.
\d{1,3}
Matches a single digit or two or three. If it's wrapped inside a capturing group, the corresponding digits not only gt matched but also it would be captured.
At first the regex engine would match the first digit 7
and checks whether the digit 7 is followed by the digits which are the multiples of 3. So the digit 7 is followed by 12
digits again followed by a word boundary. 12 is a multiple of 3. So it captures the corresponding digit 7
. Here the word boundary \b
is a must needed one which matches between a word character and a non-word character.
Next it checks the next digit 9, which is followed by 11 digits. So it won't capture only the digit 9
. Because we defined \d{1,3}
, it takes two digits that is 98
and checks for the following digits. Because it is followed by 10 digits which aren't a multiple of 3. So it captures next digit also ie, 5
and then checks for the following digits. Now it is followed by a 9 digit number. So the corresponding three digits 985
would be captured. Likewise it goes upto the number which is followed by exactly three digits and a word boundary.
Replacing all the matched chars with \1
ie, chars inside group index 1 plus a comma will give you the desired output.
You could increase the range count inside the positive lookahead for large numbers, like (?=(?:\d{3}){1,10}\b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With