Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Perl6 to process a large text file, and it's Too Slow.(2014-09)

The code host in https://github.com/yeahnoob/perl6-perf , as follow:

use v6;

my $file=open "wordpairs.txt", :r;

my %dict;
my $line;

repeat {
    $line=$file.get;
    my ($p1,$p2)=$line.split(' ');
    if ?%dict{$p1} {
        %dict{$p1} = "{%dict{$p1}} {$p2}".words;
    } else {
        %dict{$p1} = $p2;
    }
} while !$file.eof;

Running well when the "wordpairs.txt" is small.

But when the "wordpairs.txt" file is about 140,000 lines (each line, two words), it is running Very Very Slow. And it cannot Finish itself, even after 20 seconds running.

What's the problem with it? Is there any fault in the code?? Thanks for anyone help!

Following contents Added @ 2014-09-04, THANKS for many suggestions from SE Answers and IRC@freenode#perl6

The code(for now, 2014-09-04):

my %dict;
grammar WordPairs {
token word-pair { (\S*) ' ' (\S*) "\n" }
token TOP { <word-pair>* }
}
class WordPairsActions {
method word-pair($/) { %dict{$0}.push($1) }
}
my $match = WordPairs.parse(slurp, :actions(WordPairsActions));
say ?$match;

Running time cost(for now):

$ time perl6 countpairs.pl wordpairs.txt
True
The pairs count of the key word "her" in wordpairs.txt is 1036

real    0m24.043s
user    0m23.854s
sys     0m0.181s

$ perl6 --version
This is perl6 version 2014.08 built on MoarVM version 2014.08

This test's time performance is not reasonable for now(as the same proper Perl 5 code only cost about 160ms), but Much Better than my original old Perl6 code. :)

PS. The whole thing, including original test code, patch and sample text, is on github.

like image 746
yeahnoob Avatar asked Sep 03 '14 07:09

yeahnoob


2 Answers

I've tested this with code very similar to Christoph's using a file containing 10,000 lines. It takes around 15 seconds, which as you say, is significantly slower than Perl 5. I suspect that the code is slow because something this code uses hasn't seen as much optimisation effort as other parts of Rakudo and MoarVM have received recently. I'm sure that the performance of the code will improve dramatically over the next few months as whatever is slow sees more attention.

When trying to determine why some Perl 6 code is slow I suggest running perl6 on MoarVM with --profile to see whether it helps you find the bottleneck. Unfortunately, with this code it'll point to rakudo internals rather than anything you can improve.

It's certainly worth talking to #perl6 on irc.freenode.net as they'll have the knowledge to offer an alternative solution and will be able to improve its performance in the future.

like image 180
tgt Avatar answered Nov 16 '22 05:11

tgt


Rakudo isn't exactly known for its stellar performance.

Using more idiomatic code might or might not help:

my %dict;
for open('wordpairs.txt', :r).lines {
    my ($key, @words) = .words;
    push %dict{$key}, @words;
}

You could also check the other backends (Rakudo runs on MoarVM, Parrot and JVM) to see if it is equally slow everywhere.


It would be interesting to know if it's IO or processing that's slow, eg via

my %dict;

say 'start IO';
my @lines = eager open('wordpairs.txt', :r).lines;
say 'done IO';

say 'start processing';
for @lines { ... }
say 'done processing';

I believe there's also a profiler available, if you want to dig into the issue yourself.

like image 2
Christoph Avatar answered Nov 16 '22 06:11

Christoph