I see that all features of AWK are included in GAWK, besides using a system that doesn't have GAWK installed, is there ever a good reason I should use AWK versus GAWK? Does AWK have better performance over GAWK?
AWK is a text-processing language with a history spanning more than 40 years. It has a POSIX standard, several conforming implementations, and is still surprisingly relevant in 2020 — both for simple text processing tasks and for wrangling "big data".
Awk is a compiled language. Your Awk script is compiled once and applied to every line of your file at C-like speeds. It is way faster than Python. If you learn to use Awk well, you will start doing things with data that you wouldn't have had the patience to do in an interpreted language.
awk is most useful when handling text files that are formatted in a predictable way. For instance, it is excellent at parsing and manipulating tabular data. It operates on a line-by-line basis and iterates through the entire file. By default, it uses whitespace (spaces, tabs, etc.) to separate fields.
gawk command in Linux is used for pattern scanning and processing language. The awk command requires no compiling and allows the user to use variables, numeric functions, string functions, and logical operators.
awk
can refer to many things. There's awk
-the-standard, and there's many different implementations, one of which is gawk
.
Not using implementation-specific features means that you'll have a high(er) chance that your code will run unchanged on other implementations of awk
-the-language.
gawk
, being one implementation of awk
-the-language, claims to conform to awk
-the-standard, while adding some extra features.
$ man awk
…
DESCRIPTION
Gawk is the GNU Project's implementation of the AWK programming
language. It conforms to the definition of the language in the
POSIX 1003.1 Standard. This version in turn is based on the
description in The AWK Programming Language, by Aho, Kernighan,
and Weinberger. Gawk provides the additional features found in
the current version of Brian Kernighan's awk and a number of
GNU-specific extensions.
…
As for speed, using gawk
as "plain" awk
should make no difference – often, when gawk
is installed, awk
will just be a symlink to gawk
which means they'll be exactly the same program.
However, using gawk
-specific features will mean that you'll be locked in to that specific implementation – so if (hypothetically) you'd find a faster implementation, you'd probably have to adapt your script instead of just swapping out the binary. (There may be implementations that are faster, but I don't know of any as I've never had the need to make my awk
scripts run faster.)
Personally, I tend to stick to "plain" awk
and not use gawk
-specific features, but if you don't care about switching to an other implementation, using gawk
extensions might make your script easier to write and save you time on that end.
Nowadays the most common implementation of AWK is gawk, and possibly the second most common one is mawk, at least because it's the system AWK on debian.
To quote the output of apt-cache show mawk
Mawk is smaller and much faster than gawk. It has some compile-time limits such as NF = 32767 and sprintf buffer = 1020.
On the side of gawk there are a larger number of well thought extensions and, I think, a better management of errors and better error messages, that are a real bonus when you're debugging a complex script and could be a good reason to use gawk, even if you're not interested in its extensions.
On the other hand, if you have a debugged script, if you don't need a particular extension, if you can live with the builtin limits of mawk (that's a lot of ifs) and you want to squeeze the last bit of performance without leaving the comfort of AWK, then mawk is the way to go.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With