Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Raku : is there a SUPER fast way to turn an array into a string without the spaces separating the elements?

I need to convert thousands of binary byte strings, each about a megabyte long, into ASC strings. This is what I have been doing, and seems too slow:

sub fileToCorrectUTF8Str ($fileName) { # binary file
    my $finalString = "";
    my $fileBuf = slurp($fileName, :bin);    
    for @$fileBuf { $finalString = $finalString ~ $_.chr; };    
    return $finalString;
}

~@b turns @b into string with all elements separated by space, but this is not what I want. If @b = < a b c d >; the ~@b is "a b c d"; but I just want "abcd", and I want to do this REALLY fast.

So, what is the best way? I can't really use hyper for parallelism because the final string is constructed sequentially. Or can I?

like image 225
lisprogtor Avatar asked Feb 21 '20 07:02

lisprogtor


People also ask

How do you turn an array into a string?

In order to convert an array into a string in Javascript, we simply apply the toString() method on the given array, and we get a stringified version of our array. Internally javascript first converts each element into string and then concretes them to return the final string.

Which method converts array to string?

The toString() method is used for converting and representing an array into string form.

How do I turn an array into a string with spaces?

To convert an array to a string with spaces, call the join() method on the array, passing it a string containing a space as a parameter - arr. join(' ') . The join method returns a string with all array elements joined by the provided separator.

How do you convert an array to a string in Java?

toString() method: Arrays. toString() method is used to return a string representation of the contents of the specified array. The string representation consists of a list of the array's elements, enclosed in square brackets (“[]”). Adjacent elements are separated by the characters “, ” (a comma followed by a space).


1 Answers

TL;DR On an old rakudo, .decode is about 100X times as fast.

In longer form to match your code:

sub fileToCorrectUTF8Str ($fileName) { # binary file
  slurp($fileName, :bin).decode
}

Performance notes

First, here's what I wrote for testing:

# Create million and 1 bytes long file:
spurt 'foo', "1234\n6789\n" x 1e5 ~ 'Z', :bin;

# (`say` the last character to check work is done)
say .decode.substr(1e6) with slurp 'foo', :bin;

# fileToCorrectUTF8Str 'foo' );

say now - INIT now;

On TIO.run's 2018.12 rakudo, the above .decode weighs in at about .05 seconds per million byte file instead of about 5 seconds for your solution.

You could/should of course test on your system and/or using later versions of rakudo. I would expect the difference to remain in the same order, but for the absolute times to improve markedly as the years roll by.[1]

Why is it 100X as fast?

Well, first, @ on a Buf / Blob explicitly forces raku to view the erstwhile single item (a buffer) as a plural thing (a list of elements aka multiple items). That means high level iteration which, for a million element buffer, is immediately a million high level iterations/operations instead of just one high level operation.

Second, using .decode not only avoids iteration but only incurs relatively slow method call overhead once per file whereas when iterating there are potentially a million .chr calls per file. Method calls are (at least semantically) late-bound which is in principle relatively costly compared to, for example, calling a sub instead of a method (subs are generally early bound).

That all said:

  • Remember Caveat Empty[1]. For example, rakudo's standard classes generate method caches, and it's plausible the compiler just in-lines the method anyway, so it's possible there is negligible overhead for the method call aspect.

  • See also the doc's Performance page, especially Use existing high performance code.

Is the Buf.Str error message LTA?

Update See Liz++'s comment.

If you try to use .Str on a Buf or Blob (or equivalent, such as using the ~ prefix on it) you'll get an exception. Currently the message is:

Cannot use a Buf as a string, but you called the Str method on it

The doc for .Str on a Buf/Blob currently says:

In order to convert to a Str you need to use .decode.

It's arguably LTA that the error message doesn't suggest the same thing.

Then again, before deciding what to do about this, if anything, we need to consider what, and how, folk could learn from anything that goes wrong, including signals about it, such as error messages, and also what and how they do in fact currently learn, and bias our reactions toward building the right culture and infrastructure.

In particular, if folk can easily connect between an error message they see, and online discussion that elaborates on it, that needs to be taken into account and perhaps encouraged and/or made easier.

For example, there's now this SO covering this issue with the error message in it, so a google is likely to get someone here. Leaning on that might well be a more appropriate path forward than changing the error message. Or it might not. The change would be easy...

Please consider commenting below and/or searching existing rakudo issues to see if improvement of the Buf.Str error message is being considered and/or whether you wish to open an issue to propose it be altered. Every rock moved is at least great exercise, and, as our collective effort becomes increasingly wise, improves (our view of) the mountain.

Footnotes

[1] As the well known Latin saying Caveat Empty goes, both absolute and relative performance of any particular raku feature, and more generally any particular code, is always subject to variation due to factors including one's system's capabilities, its load during the time it's running the code, and any optimization done by the compiler. Thus, for example, if your system is "empty", then your code may run faster. Or, as another example, if you wait a year or three for the compiler to get faster, advances in rakudo's performance continue to look promising.

like image 178
raiph Avatar answered Oct 02 '22 15:10

raiph