What are the main differences among them? And in which typical scenarios is it better to use each language?
The difference between sed and awk is that sed is a command utility that works with streams of characters for searching, filtering and text processing while awk more powerful and robust than sed with sophisticated programming constructs such as if/else, while, do/while etc.
Some systems still do not use perl, so you still need to use awk . If you have small short scripts, then awk is faster because it do not use much RAM.
Python takes a huge advantage over Perl when it comes to code readability. Python's code is a lot clearer to understand than that of Perl even when reading code after years. With indentation representing the block of code, and proper structuring, Python's code is a lot cleaner.
AWK, like sed, is a programming language that deals with large bodies of text. But while people use sed to process and modify text, people mostly use AWK as a tool for analysis and reporting. Like sed, AWK was first developed at Bell Labs in the 1970s.
In order of appearance, the languages are sed
, awk
, perl
, python
.
The sed
program is a stream editor and is designed to apply the actions from a script to each line (or, more generally, to specified ranges of lines) of the input file or files. Its language is based on ed
, the Unix editor, and although it has conditionals and so on, it is hard to work with for complex tasks. You can work minor miracles with it - but at a cost to the hair on your head. However, it is probably the fastest of the programs when attempting tasks within its remit. (It has the least powerful regular expressions of the programs discussed - adequate for many purposes, but certainly not PCRE - Perl-Compatible Regular Expressions)
The awk
program (name from the initials of its authors - Aho, Weinberger, and Kernighan) is a tool initially for formatting reports. It can be used as a souped-up sed
; in its more recent versions, it is computationally complete. It uses an interesting idea - the program is based on 'patterns matched' and 'actions taken when the pattern matches'. The patterns are fairly powerful (Extended Regular Expressions). The language for the actions is similar to C. One of the key features of awk
is that it splits the input automatically into records and each record into fields.
Perl was written in part as an awk-killer and sed-killer. Two of the programs provided with it are a2p
and s2p
for converting awk
scripts and sed
scripts into Perl. Perl is one of the earliest of the next generation of scripting languages (Tcl/Tk can probably claim primacy). It has powerful integrated regular expression handling with a vastly more powerful language. It provides access to almost all system calls and has the extensibility of the CPAN modules. (Neither awk
nor sed
is extensible.) One of Perl's mottos is "TMTOWTDI - There's more than one way to do it" (pronounced "tim-toady"). Perl has 'objects', but it is more of an add-on than a fundamental part of the language.
Python was written last, and probably in part as a reaction to Perl. It has some interesting syntactic ideas (indenting to indicate levels - no braces or equivalents). It is more fundamentally object-oriented than Perl; it is just as extensible as Perl.
OK - when to use each?
I'm not aware of anything that Perl can do that Python can't, nor vice versa. The choice between the two would depend on other factors. I learned Perl before there was a Python, so I tend to use it. Python has less accreted syntax and is generally somewhat simpler to learn. Perl 6, when it becomes available, will be a fascinating development.
(Note that the 'overviews' of Perl and Python, in particular, are woefully incomplete; whole books could be written on the topic.)
After mastering a few dozen languages, you get tired of people like S. Lott (see his controversial answer to this question, nearly half as many down-votes as up (+45/-22) six years after answering).
Sed is the best tool for extremely simple command-line pipelines. In the hands of a sed master, it's suitable for one-offs of arbitrary complexity, but it should not be used in production code except in very simple substitution pipelines. Stuff like 's/this/that/.'
Gawk (the GNU awk) is by far the best choice for complex data reformatting when there is only a single input source and a single output (or, multiple outputs sequentially written). Since a great deal of real-world work conforms to this description, and a good programmer can learn gawk in two hours, it is the best choice. On this planet, simpler and faster is better!
Perl or Python are far better than any version of awk or sed when you have very complex input/output scenarios. The more complex the problem is, the better off you are using python, from a maintenance and readability standpoint. Note, however, that a good programmer can write readable code in any language, and a bad programmer can write unmaintainable crap in any useful language, so the choice of perl or python can safely be left to the preferences of the programmer if said programmer is skilled and clever.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With