I'm trying to convert the word wall into its ascii code list (119, 97, 108, 108)
like this:
my @ascii="abcdefghijklmnopqrstuvwxyz";
my @tmp;
map { push @tmp, $_.ord if $_.ord == @ascii.comb.any.ord }, "wall".comb;
say @tmp;
Is there a way to use the @tmp
without declaring it in a seperate line?
Is there a way to produce the ascii code list in one line instead of 3 lines? If so, how to do it?
Note that I have to use the @ascii
variable i.e. I can't make use of the consecutively increasing ascii sequence (97, 98, 99 ... 122)
because I plan to use this code for non-ascii languages too.
There are a couple of things we can do here to make it work.
First, let's tackle the @ascii
variable. The @
sigil indicates a positional variable, but you assigned a single string to it. This creates a 1-element array ['abc...']
, which will cause problems down the road. Depending on how general you need this to be, I'd recommend either creating the array directly:
my @ascii = <a b c d e f g h i j k l m n o p q r s t u v x y z>;
my @ascii = 'a' .. 'z';
my @ascii = 'abcdefghijklmnopqrstuvwxyz'.comb;
or going ahead and handling the any
part:
my $ascii-char = any <a b c d e f g h i j k l m n o p q r s t u v x y z>;
my $ascii-char = any 'a' .. 'z';
my $ascii-char = 'abcdefghijklmnopqrstuvwxyz'.comb.any;
Here I've used the $
sigil, because any
really specifies any single value, and so will function as such (which also makes our life easier). I'd personally use $ascii
, but I'm using a separate name to make later examples more distinguishable.
Now we can handle the map function. Based on the above two versions of ascii
, we can rewrite your map function to either of the following
{ push @tmp, $_.ord if $_ eq @ascii.any }
{ push @tmp, $_.ord if $_ eq $ascii-char }
Note that if you prefer to use ==
, you can go ahead and create the numeric values in the initial ascii
creation, and then use $_.ord
. As well, personally, I like to name the mapped variable, e.g.:
{ push @tmp, $^char.ord if $^char eq @ascii.any }
{ push @tmp, $^char.ord if $^char eq $ascii-char }
where $^foo
replaces $_
(if you use more than one, they map alphabetical order to @_[0]
, @_[1]
, etc).
But let's get to the more interesting question here. How can we do all of this without needing to predeclare @tmp
? Obviously, that just requires creating the array in the map loop. You might think that might be tricky for when we don't have an ASCII value, but the fact that an if
statement returns Empty
(or ()
) if it's not run makes life really easy:
my @tmp = map { $^char.ord if $^char eq $ascii-char }, "wall".comb;
my @tmp = map { $^char.ord if $^char eq @ascii.any }, "wall".comb;
If we used "wáll", the list collected by map
would be 119, Empty, 108, 108
, which is automagically returned as 119, 108, 108
. Consequently, @tmp
is set to just 119, 108, 108
.
Yes there is a much simpler way.
"wall".ords.grep('az'.ords.minmax);
Of course this relies on a
to z
being an unbroken sequence. This is because minmax
creates a Range object based on the minimum and maximum value in the list.
If they weren't in an unbroken sequence you could use a junction.
"wall".ords.grep( 'az'.ords.minmax | 'AZ'.ords.minmax );
But you said that you want to match other languages. Which to me screams regex.
"wall".comb.grep( /^ <:Ll> & <:ascii> $/ ).map( *.ord )
This matches Lowercase Letters that are also in ASCII.
Actually we can make it even simpler. comb
can take a regex which determines which characters it takes from the input.
"wall".comb( / <:Ll> & <:ascii> / ).map( *.ord )
# (119, 97, 108, 108)
"ΓΔαβγδε".comb( / <:Ll> & <:Greek> / ).map( *.ord )
# (945, 946, 947, 948, 949)
# Does not include Γ or Δ, as they are not lowercase
Note that the above only works with ASCII if you don't have a combining accent.
"de\c[COMBINING ACUTE ACCENT]f".comb( / <:Ll> & <:ascii> / )
# ("d", "f")
The Combining Acute Accent combines with the e
which composes to Latin Small Letter E With Acute.
That composed character is not in ASCII so it is skipped.
It gets even weirder if there isn't a composed value for the character.
"f\c[COMBINING ACUTE ACCENT]".comb( / <:Ll> & <:ascii> / )
# ("f́",)
That is because the f
is lowercase and in ASCII. The composing codepoint gets brought along for the ride though.
Basically if your data has, or can have combining accents and if it could break things, then you are better off dealing with it while it is still in binary form.
$buf.grep: {
.uniprop() eq 'Ll' #
&& .uniprop('Block') eq 'Basic Latin' # ASCII
}
The above would also work for single character strings because .uniprop
works on either integers representing a codepoint, or on the actual character.
"wall".comb.grep: {
.uniprop() eq 'Ll' #
&& .uniprop('Block') eq 'Basic Latin' # ASCII
}
Note again that this would have the same issues with composing codepoints since it works with strings.
You may also want to use .uniprop('Script')
instead of .uniprop('Block')
depending on what you want to do.
Here's a working approach using Raku's trans
method (code snippet performed in the Raku REPL):
> my @a = "wall".comb;
[w a l l]
> @a.trans('abcdefghijklmnopqrstuvwxyz' => ords('abcdefghijklmnopqrstuvwxyz') ).put;
119 97 108 108
Above, we handle an ascii string. Below I add the "é" character, and show a 2-step solution:
> my @a = "wallé".comb;
[w a l l é]
> my @b = @a.trans('abcdefghijklmnopqrstuvwxyz' => ords('abcdefghijklmnopqrstuvwxyz') );
[119 97 108 108 é]
> @b.trans("é" => ords("é")).put
119 97 108 108 233
Nota bene #1: Although all the code above works fine, when I tried shortening the alphabet to 'a'..'z'
I ended up seeing erroneous return values...hence the use of the full 'abcdefghijklmnopqrstuvwxyz'
.
Nota bene #2: One question in my mind is trying to suppress output when trans
fails to recognize a character (e.g. how to suppress assignment of "é" as the last element of @b
in the second-example code above). I've tried adding the :delete
argument to trans
, but no luck.
EDITED: To remove unwanted characters, here's code using grep
(à la @Brad Gilbert), followed by trans
:
> my @a = "wallé".comb;
[w a l l é]
> @a.grep('a'..'z'.comb.any).trans('abcdefghijklmnopqrstuvwxyz' => ords('abcdefghijklmnopqrstuvwxyz') ).put
119 97 108 108
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With