So I am trying to analyze very large log files in linux and I have seen plenty of solutions for the reverse of this, but the program that records the data doesn't allow for output formatting therefore it only outputs in human readable format (I know, what a pain). So the question is: How can I convert human readable to bytes using something like awk:
So converting this:
937
1.43K
120.3M
to:
937
1464
126143693
I can afford and I expect some rounding errors.
Thanks in advance.
P.S. Doesn't have to be awk as long as it can provide in-line conversions.
I found this but the awk command given doesn't appear to work correctly. It outputs something like 534K"0".
I also found a solution using sed and bc, but because it uses bc it has limited effectiveness meaning it only can use one column at a time and all the data has to be appropriate for bc or else it fails.
sed -e 's/K/\*1024/g' -e 's/M/\*1048576/g' -e 's/G/\*1073741824/g' | bc
Use numfmt --from=iec
from GNU coreutils.
Here's a function that understands binary and decimal prefixes and is easily extendable for large units should there be a need:
dehumanise() {
for v in "${@:-$(</dev/stdin)}"
do
echo $v | awk \
'BEGIN{IGNORECASE = 1}
function printpower(n,b,p) {printf "%u\n", n*b^p; next}
/[0-9]$/{print $1;next};
/K(iB)?$/{printpower($1, 2, 10)};
/M(iB)?$/{printpower($1, 2, 20)};
/G(iB)?$/{printpower($1, 2, 30)};
/T(iB)?$/{printpower($1, 2, 40)};
/KB$/{ printpower($1, 10, 3)};
/MB$/{ printpower($1, 10, 6)};
/GB$/{ printpower($1, 10, 9)};
/TB$/{ printpower($1, 10, 12)}'
done
}
example:
$ dehumanise 2K 2k 2KiB 2KB
2048
2048
2048
2000
$ dehumanise 2G 2g 2GiB 2GB
2147483648
2147483648
2147483648
2000000000
The suffixes are case-insensitive.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With