I have a trim function that I sometimes use in awk, but it's kind of slow for big inputs:
#!/bin/bash
time {
    yes $'\t   Lorem ipsum dolor sit amet consectetur adipiscing elit. Duis dapibus rutrum facilisis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Etiam tristique libero eu nibh porttitor amet fermentum.\t    \r' |
    head -n 1000000 |
    awk '
        { trim($0) }
        function trim(string) {
            gsub(/^[ \t\r]+|[ \t\r]+$/, "", string);
            return string
        }
    '
}
real    0m9.074s
user    0m9.179s
sys     0m0.381s
How can I speed it up?
On my machine (Windows laptop running git bash with gawk 5.0.0) doing 2 separate sub()s seems to be very slightly faster than one gsub():
$ cat gsub.sh
#!/bin/bash
time {
    yes $'\t   Lorem ipsum dolor sit amet consectetur adipiscing elit. Duis dapibus rutrum facilisis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Etiam tristique libero eu nibh porttitor amet fermentum.\t    \r' |
    head -n 1000000 |
    awk '
        { trim($0) }
        function trim(string) {
            gsub(/^[ \t\r]+|[ \t\r]+$/,"",string)
            return string
        }
    '
}
$ cat subs.sh
#!/bin/bash
time {
    yes $'\t   Lorem ipsum dolor sit amet consectetur adipiscing elit. Duis dapibus rutrum facilisis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Etiam tristique libero eu nibh porttitor amet fermentum.\t    \r' |
    head -n 1000000 |
    awk '
        { trim($0) }
        function trim(string) {
            sub(/^[ \t\r]+/,"",string)
            sub(/[ \t\r]+$/,"",string)
            return string
        }
    '
}
Timing over 3 runs to remove caching impact:
    ./gsub.sh              ./subs.sh
real    0m2.288s        real    0m2.213s
user    0m2.325s        user    0m2.419s
sys     0m3.170s        sys     0m3.075s
real    0m2.269s        real    0m2.219s
user    0m2.371s        user    0m2.420s
sys     0m3.310s        sys     0m3.139s
real    0m2.275s        real    0m2.250s
user    0m2.434s        user    0m2.434s
sys     0m3.216s        sys     0m3.199s
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With