In perl 5, you can emulate wc -l
using oneliner:
perl -lnE 'END {say $.}' test.txt
How to implement this functionality on Raku
If you try to implement this:
raku -e 'say "test.txt".IO.open.lines.elems'
it turns out to be slow and uses a lot of memory
Information for reproduce:
$ wget http://eforexcel.com/wp/wp-content/uploads/2017/07/1500000%20Sales%20Records.zip
$ unzip "1500000 Sales Records.zip"
$ mv "1500000 Sales Records.csv" part.txt
$ for i in `seq 1 10`; do cat part.txt >> test.txt ; done
$ du -sh test.txt
1.8G test.txt
$ time wc -l test.txt
15000000 test.txt
real 0m0,350s
user 0m0,143s
sys 0m0,205s
$ time perl -lnE 'END { say $. }' test.txt
15000001
real 0m1,981s
user 0m1,719s
sys 0m0,256s
$ time raku -e 'say "test.txt".IO.open.lines.elems'
15000001
real 2m51,852s
user 0m25,129s
sys 0m6,378s
# Using swap (maximum uses 2.2G swap):
# Before `raku -e ''`
$ free -m
total used free shared buff/cache available
Mem: 15009 1695 12604 107 708 12917
Swap: 7583 0 7583
# After `raku -e ''`
$ free -m
total used free shared buff/cache available
Mem: 15009 752 13923 72 332 13899
Swap: 7583 779 6804
# Swap not used
$ time raku -ne '++$ andthen END .say' test.txt
15000001
real 1m44,906s
user 2m14,165s
sys 0m0,653s
$ raku -v
This is Rakudo version 2019.11 built on MoarVM version 2019.11
implementing Perl 6.d.
One option that's still likely to be pretty slow compared to perl
but worth comparing:
raku -ne '++$ andthen END .say' test.txt
The l
command line option is redundant.
$
is an anonymous state scalar.
andthen
tests that its lhs is defined, and if so, sets that value as the topic ($_
) and then evaluates its rhs.
END
is similar to perl
's END
. Note that it returns Nil
to the andthen
but that doesn't matter here because we're using the END
's statement for its side-effect.
Several things will impact this code's speed. Some things I can think of:
Compiler startup overhead. Ignoring any modules being used, the raku
compiler Rakudo has a startup overhead of about a tenth of a second on typical hardware compared to a fairly negligible one for perl
.
The notion of a "line". In perl
, the default notion of line processing is reading a series of bytes, some of which represent a line end. In raku
, the default notion of line processing is reading a UTF-8 string, some of which represents line ends. Thus perl
only incurs the reading overhead of an ASCII (or Extended ASCII) decoder whereas raku
incurs the reading overhead of a UTF-8 decoder.
Compiler optimizations. perl
is generally optimized to the max. It wouldn't surprise me if perl -lnE 'END {say $.}' test.txt
takes advantage of some clever optimizations. In contrast, work on Rakudo optimization is still in its early days relatively speaking.
The only things I think anyone can do about the first and last of the three points I've mentioned above are to wait N years and/or contribute to the compiler's improvement.
There will be a way to work around raku's UTF-8-by-default. Perhaps something like the following is already doable and significantly faster than raku's default, at least ignoring the overhead of using a module called foo
:
raku -Mfoo -ne '++$ andthen END .say' test.txt
where module foo
switches the default encoding for file I/O to ASCII or whatever from the available encodings.
I haven't checked that this is actually doable in current Rakudo but would be surprised if were not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With