Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do Perl 6 POSIX character classes respect the LOCALE?

Does Perl 6 POSIX character classes respect the LOCALE? I was playing with a program that would print all the characters matching a POSIX char class, and it seems to always print the same set no matter what I set my locale too. If my locale is en_US.US-ASCII, I still get 520ish digits. It's also annoying that doing this on a Mac means I don't have the cool locale exploration tools found elsewhere (or, they are there under different names).

This is all under this command which I need to make into a shell alias:

$ perl6 -e 'say join " ", map *.gist, $*VM, $*PERL, $*DISTRO, $*KERNEL'
moar (2016.10) Perl 6 (6.c) macosx (10.10.5) darwin (14.5.0)

And, the program:

my $properties = set( <
    alnum alpha ascii blank cntrl digit graph lower print graph punct
    space upper word xdigit
    > );

sub MAIN ( Str $property where * ∈ $properties = 'digit' ) {
    say "NAME is " ~ %*ENV<NAME>;
    say "LC_CTYPE is " ~ ( %*ENV<LC_CTYPE> // %*ENV<LC_ALL> );
    say "property is $property";
    use MONKEY-SEE-NO-EVAL;


    my $pattern = EVAL "rx/ <$property> /";
    say "regex is " ~ $pattern.gist;

    show_chars( $pattern );
    }

sub show_chars ( Regex $pattern ) {
    for 0 .. 0x10FFFF -> $codepoint {
        state $count = 0;
        LAST { say "\nThere were $count characters" }
        my $char = chr( $codepoint );
        next unless $char ~~ $pattern;
        $count++;

        print "$char ";
        print "\n" if $count %% 50;
        }
    }

Notice I do a stupid EVAL thing in the program. I was looking for a replacement for variable interpolation in rx. S05 indicates it was a thing, but there's no docs for it so I guess it isn't. I started to explore my own tokens, but had to move on. And, now I've asked a separate question about the interpolation.

like image 440
brian d foy Avatar asked Feb 06 '23 15:02

brian d foy


2 Answers

To the best of my knowledge, Perl 6 regexes do not support POSIX character classes. The built-in methods you mentioned map to Unicode properties or blocks (or similar constructs, pardon my Unicode ignorance), and none of them are Locale specific.

As far as the EVAL is concerned, you can get rid of it like this:

my $re_string = '<alpha>';
say 'a' ~~ rx / <$re_string> /;
like image 153
moritz Avatar answered Feb 08 '23 14:02

moritz


2019 update Many links don't work. Some are gone for good, some temporarily, some have replacements.1

About this answer:

Thanks for the work! FWIW, I didn't care about locales at all. I just saw "POSIX" and was surprised the Perl 6 would care about that. – brian d foy

Thanks. I saw that Moritz had already answered your SO before I started. But I decided I wanted to spelunk the repos etc. looking for matches of the string 'locale'. ... Then I decided it would be useful to publish these results and that they would be more useful here, where someone searching for perl6 and locale might find them, than as just a personal gist. – raiph


Mentions of 'locale' in relevant Perl 6 and/or Rakudo documents

I'm not directly addressing your narrow question about POSIX and regexes. Moritz has answered that. This post is just me documenting my broad search for answers to the more general question of "What support is there for any locale specific processing in Perl 6 and/or Rakudo?" by searching for matches of 'locale' in various repos and the like.

This "answer" combines definitive sources with wild speculation. If it's linked, it's definitive. If it's my prose, it's wild speculation.

A search for 'locale' in the existing public module list yields 3 modules. Afaict, none affect Perl 6 behavior.

A google search of docs.perl6.org for 'locale' yields "Your search - site:docs.perl6.org locale - did not match any documents."

An in-page search for 'locale' at perl6.fail yields a single bug report.

A google search of the design/speculation docs yields three results of interest:

  1. A locale method. It looks like it was a long ago specified way to find out what the current OS locale is or somesuch. A search of Rakudo's source for 'locale' yields zero matches.

  2. Mention of deliberately not handling time locale processing in core.

  3. Mention of a built in rule <blank> which matches "a single 'blank' character -- in most locales, this corresponds to space and tab.". This rule is implemented as the blank method on the Cursor class in NQP. The code say so " \t" ~~ / <blank>+ / returns True using my system.

A search of NQP's source for 'locale' yields zero matches.

A search of MoarVM's repo for 'locale' yields matches in the third party GCC libatomic library (a library of portable atomic operations; I've no idea why such code should care about locale).

A google search for 'locale' in #perl6 yields a bunch of mentions including:

  • 2007 TimToady: "at the standard unicode level ... locales are completely ignored". And "but if you ask for language dependent character processing, you can ask it to pay attention to a locale". (I don't think anyone has yet written the code necessary for the latter.)

  • 2008 No one answers Moritz (who wrote the other answer to this question above) when he asks "any idea how locales will be handled in Perl 6?".

  • 2012 TimToady says "we tend to dislike locales intensely".

  • 2016 "some standard locale stuff for dates, numbers and stuff would be useful"

  • 2016 "i don't think we have locale-aware formatting of numbers".

  • 2016 "Perl 6 doesn't handle anything locale-specific such as those Turkish special cases AFAIK."

Footnotes

1 Many links in this answer are broken in 2019:

  • perl6.fail is gone for good. Use the rt bug tracker, perl6 queue, and gh issue queues for perl6 and rakudo instead. The rt tracker is going away and is currently read only.

  • design.perl6.org is down at the moment, perhaps for good. The best substitute I know is archive.org. But afaik that's not searchable across pages.

  • irclog.perlgeek.de links are down for good. The best substitute I know is colabti's irclog which goes back to about 2009 or so. (Moritz's perlgeek data went back to 2005.) Use the date in the URL to map over to colabti's log.

like image 31
raiph Avatar answered Feb 08 '23 16:02

raiph