Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to emulate wc -l in Raku

Tags:

perl

raku

In perl 5, you can emulate wc -l using oneliner:

perl -lnE 'END {say $.}' test.txt

How to implement this functionality on Raku

If you try to implement this:

raku -e 'say "test.txt".IO.open.lines.elems'

it turns out to be slow and uses a lot of memory

Information for reproduce:

$ wget http://eforexcel.com/wp/wp-content/uploads/2017/07/1500000%20Sales%20Records.zip
$ unzip "1500000 Sales Records.zip"
$ mv "1500000 Sales Records.csv" part.txt
$ for i in `seq 1 10`; do cat part.txt >> test.txt ; done
$ du -sh test.txt
1.8G    test.txt

$ time wc -l test.txt
15000000 test.txt

real    0m0,350s
user    0m0,143s
sys     0m0,205s

$ time perl -lnE 'END { say $. }' test.txt
15000001

real    0m1,981s
user    0m1,719s
sys     0m0,256s

$ time raku -e 'say "test.txt".IO.open.lines.elems'
15000001

real    2m51,852s
user    0m25,129s
sys     0m6,378s

# Using swap (maximum uses 2.2G swap):
# Before `raku -e ''`

$ free -m
              total        used        free      shared  buff/cache   available
Mem:          15009        1695       12604         107         708       12917
Swap:          7583           0        7583

# After `raku -e ''`

$ free -m
              total        used        free      shared  buff/cache   available
Mem:          15009         752       13923          72         332       13899
Swap:          7583         779        6804

# Swap not used
$ time raku -ne '++$ andthen END .say' test.txt
15000001

real    1m44,906s
user    2m14,165s
sys     0m0,653s

$ raku -v
This is Rakudo version 2019.11 built on MoarVM version 2019.11
implementing Perl 6.d.
like image 218
TheAthlete Avatar asked Feb 28 '20 08:02

TheAthlete


1 Answers

One option that's still likely to be pretty slow compared to perl but worth comparing:

raku -ne '++$ andthen END .say' test.txt

The l command line option is redundant.

$ is an anonymous state scalar.

andthen tests that its lhs is defined, and if so, sets that value as the topic ($_) and then evaluates its rhs.

END is similar to perl's END. Note that it returns Nil to the andthen but that doesn't matter here because we're using the END's statement for its side-effect.

Several things will impact this code's speed. Some things I can think of:

  • Compiler startup overhead. Ignoring any modules being used, the raku compiler Rakudo has a startup overhead of about a tenth of a second on typical hardware compared to a fairly negligible one for perl.

  • The notion of a "line". In perl, the default notion of line processing is reading a series of bytes, some of which represent a line end. In raku, the default notion of line processing is reading a UTF-8 string, some of which represents line ends. Thus perl only incurs the reading overhead of an ASCII (or Extended ASCII) decoder whereas raku incurs the reading overhead of a UTF-8 decoder.

  • Compiler optimizations. perl is generally optimized to the max. It wouldn't surprise me if perl -lnE 'END {say $.}' test.txt takes advantage of some clever optimizations. In contrast, work on Rakudo optimization is still in its early days relatively speaking.

The only things I think anyone can do about the first and last of the three points I've mentioned above are to wait N years and/or contribute to the compiler's improvement.

There will be a way to work around raku's UTF-8-by-default. Perhaps something like the following is already doable and significantly faster than raku's default, at least ignoring the overhead of using a module called foo:

raku -Mfoo -ne '++$ andthen END .say' test.txt

where module foo switches the default encoding for file I/O to ASCII or whatever from the available encodings.

I haven't checked that this is actually doable in current Rakudo but would be surprised if were not.

like image 168
raiph Avatar answered Nov 14 '22 10:11

raiph