Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to write a shell script which is faster than the equivalent script in Perl? [closed]

I wrote multiple scripts in Perl and shell and I have compared the real execution time. In all the cases, the Perl script was more than 10 times faster than the shell script.

So I wondered if it possible to write a shell script which is faster than the same script in Perl? And why is Perl faster than shell although I use the system function in Perl script?

like image 805
JohnJohnGa Avatar asked Apr 24 '10 09:04

JohnJohnGa


3 Answers

There are few ways to make your shell (eg Bash) execute faster.

  1. Try to use less of external commands if Bash's internals can do the task for you. Eg, excessive use of sed , grep, awk et for string/text manipulation.
  2. If you are manipulating relatively BIG files, don't use bash's while read loop. Use awk. If you are manipulating really BIG files, you can use grep to search for the patterns you want, and then pass them to awk to "edit". grep's searching algorithm is very good and fast. If you want to get only front or end of the file, use head and tail.
  3. file manipulation tools such as sed, cut, grep, wc, etc all can be done with one awk script or using Bash internals if not complicated. Therefore, you can try to cut down the use of these tools that overlap in their functions. Unix pipes/chaining is excellent, but using too many of them, eg command|grep|grep|cut|sed makes your code slow. Each pipe is an overhead. For this example, just one awk does them all. command | awk '{do everything here}' The closest tool you can use which can match Perl's speed for certain tasks, eg string manipulation or maths, is awk. Here's a fun benchmark for this solution. There are around 9million numbers in the file

Output

$ head -5 file
1
2
3
34
42
$ wc -l <file
8999987

# time perl -nle '$sum += $_ } END { print $sum' file
290980117

real    0m13.532s
user    0m11.454s
sys     0m0.624s

$ time awk '{ sum += $1 } END { print sum }' file
290980117

real    0m9.271s
user    0m7.754s
sys     0m0.415s

$ time perl -nle '$sum += $_ } END { print $sum' file
290980117

real    0m13.158s
user    0m11.537s
sys     0m0.586s

$ time awk '{ sum += $1 } END { print sum }' file
290980117

real    0m9.028s
user    0m7.627s
sys     0m0.414s

For each try, awk is faster than Perl.

Lastly, try to learn awk beyond what they can do as one liners.

like image 130
ghostdog74 Avatar answered Sep 20 '22 19:09

ghostdog74


This might fall dangerously close to arm-chair optimization, but here are some ideas that might rationalize your results:

  • Fork/exec: almost anything useful that is done by a shell script is done via a shell-out, that is starting a new shell and running the a command such as sed, awk, cat etc. More often then not, more then one process is executed, and data is moved via pipes.

  • Data structures: Perl's data structures are more sophisticated then Bash's or Csh's. This typically force the programmer to be created with data storage. This can take the forms of:

    • use non optimal data structures (arrays instead of hashes)
    • store data in textual form (for example integers as strings) that needed to be reinterpreted every time.
    • save data in a file, and re-parse it again and again.
    • etc.
  • Non optimized implementation: some shell construct might not be designed with optimization in mind, but with user convenience. For example, I have reason to believe that the bash implementation of Parameter Expansion in particular ${foo//search/replace} is sub-optimal relative to the same operation in sed. This is typically not a problem for day-to-day tasks.

like image 44
Chen Levy Avatar answered Sep 21 '22 19:09

Chen Levy


Okay, I know I'm asking for it by opening up a can of worms closed two years ago, but I'm not 100% happy with any of the answers.

The right answer is YES. But most new coders will still go to Perl and Python and write code that struggles mightily to WRAP CALLS TO EXTERNAL EXECUTABLES because they lack the mentoring or experience required to know when to use which tools.

The Korn Shell (ksh) has fast builtin math, and a fully capable and speedy regex engine that, gasp, can handle Perl type regex. It also has associative arrays. It can even load external .so libraries. And it was a finished and mature product 10 years ago. It's even already installed on your Mac.

like image 31
Gabbar Singh Avatar answered Sep 23 '22 19:09

Gabbar Singh