I have a series of numbers of different lengths (varying from 1 to 6 digits) within some text. I want to equalize the lenghts of all these numbers by padding shorter numbers by zeros.
E.g. The following 4 lines -
A1:11 A2:112 A3:223333 A4:1333 A5:19333 A6:4
Should become padded integers
A1:000011 A2:000112 A3:223333 A4:001333 A5:019333 A6:000004
I am using "sed" and the following combersome expression:
sed -e 's/:\([0-9]\{1\}\)\>/:00000\1/' \ -e 's/:\([0-9]\{2\}\)\>/:0000\1/' \ -e 's/:\([0-9]\{3\}\)\>/:000\1/' \ -e 's/:\([0-9]\{4\}\)\>/:00\1/' \ -e 's/:\([0-9]\{5\}\)\>/:0\1/'
Is it possible to do this in a better expression than this?
Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.
To match any number from 0 to 9 we use \d in regex. It will match any single digit number from 0 to 9. \d means [0-9] or match any number from 0 to 9. Instead of writing 0123456789 the shorthand version is [0-9] where [] is used for character range.
So, yes, regular expressions really only apply to strings. If you want a more complicated FSM, then it's possible to write one, but not using your local regex engine.
You can pad it with too many zeros and then keep only the last six digits:
sed -e 's/:/:00000/;s/:0*\([0-9]\{6,\}\)$/:\1/'
Result:
A1:000011 A2:000112 A3:223333 A4:001333 A5:019333 A6:000004
It might be better to use awk though:
awk -F: '{ printf("%s:%06d\n", $1, $2) }'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With