Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trim function for awk

Tags:

trim

awk

I have a trim function that I sometimes use in awk, but it's kind of slow for big inputs:

#!/bin/bash

time {
    yes $'\t   Lorem ipsum dolor sit amet consectetur adipiscing elit. Duis dapibus rutrum facilisis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Etiam tristique libero eu nibh porttitor amet fermentum.\t    \r' |
    head -n 1000000 |
    awk '
        { trim($0) }

        function trim(string) {
            gsub(/^[ \t\r]+|[ \t\r]+$/, "", string);
            return string
        }
    '
}
real    0m9.074s
user    0m9.179s
sys     0m0.381s

How can I speed it up?

like image 438
Fravadona Avatar asked Sep 17 '25 23:09

Fravadona


1 Answers

On my machine (Windows laptop running git bash with gawk 5.0.0) doing 2 separate sub()s seems to be very slightly faster than one gsub():

$ cat gsub.sh
#!/bin/bash

time {
    yes $'\t   Lorem ipsum dolor sit amet consectetur adipiscing elit. Duis dapibus rutrum facilisis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Etiam tristique libero eu nibh porttitor amet fermentum.\t    \r' |
    head -n 1000000 |
    awk '
        { trim($0) }

        function trim(string) {
            gsub(/^[ \t\r]+|[ \t\r]+$/,"",string)
            return string
        }
    '
}

$ cat subs.sh
#!/bin/bash

time {
    yes $'\t   Lorem ipsum dolor sit amet consectetur adipiscing elit. Duis dapibus rutrum facilisis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Etiam tristique libero eu nibh porttitor amet fermentum.\t    \r' |
    head -n 1000000 |
    awk '
        { trim($0) }

        function trim(string) {
            sub(/^[ \t\r]+/,"",string)
            sub(/[ \t\r]+$/,"",string)
            return string
        }
    '
}

Timing over 3 runs to remove caching impact:

    ./gsub.sh              ./subs.sh

real    0m2.288s        real    0m2.213s
user    0m2.325s        user    0m2.419s
sys     0m3.170s        sys     0m3.075s

real    0m2.269s        real    0m2.219s
user    0m2.371s        user    0m2.420s
sys     0m3.310s        sys     0m3.139s

real    0m2.275s        real    0m2.250s
user    0m2.434s        user    0m2.434s
sys     0m3.216s        sys     0m3.199s
like image 65
Ed Morton Avatar answered Sep 21 '25 00:09

Ed Morton