Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are all the Unicode properties a Perl 6 character will match?

Tags:

unicode

raku

The .uniprop returns a single property:

put join ', ', 'A'.uniprop;

I get back one property (the general category):

Lu

Looking around I didn't see a way to get all the other properties (including derived ones such as ID_Start and so on). What am I missing? I know I can go look at the data files, but I'd rather have a single method that returns a list.

I am mostly interested in this because regexes understand properties and match the right properties. I'd like to take any character and show which properties it will match.

like image 725
brian d foy Avatar asked Mar 12 '18 22:03

brian d foy


2 Answers

"A".uniprop("Alphabetic") will get the Alphabetic property. Are you asking for what other properties are possible?

All these that have a checkmark by them will likely work. This just displays that status of roast testing for it https://github.com/perl6/roast/issues/195

This may more more useful for you, https://github.com/rakudo/rakudo/blob/master/src/core/Cool.pm6#L396-L483 The first hash is just mapping aliases for the property names to the full names. The second hash specifices whether the property is B for boolean, S for a string, I for integer, nv for numeric value, na for Unicode Name and a few other specials.

If I didn't understand you question please let me know and I will revise this answer.

Update: Seems you want to find out all the properties that will match. What you will want to do is iterate all of https://github.com/rakudo/rakudo/blob/master/src/core/Cool.pm6#L396-L483 and looking only at string, integer and boolean properties. Here is the full thing: https://gist.github.com/samcv/ae09060a781bb4c36ae6cac80ea9325f

sub MAIN {
    use Test;
    my $char = 'a';
    my @result = what-matches($char);
    for @result {
        ok EVAL("'$char' ~~ /$_/"), "$char ~~ /$_/";
    }
}
use nqp;
sub what-matches (Str:D $chr) {
    my @result;
    my %prefs = prefs();
    for %prefs.keys -> $key {
        given %prefs{$key} {
            when 'S' {
                my $propval = $chr.uniprop($key);
                if $key eq 'Block' {
                    @result.push: "<:In" ~ $propval.trans(' ' => '') ~ ">";
                }
                elsif $propval {
                    @result.push: "<:" ~ $key ~ "<" ~ $chr.uniprop($key) ~ ">>";
                }
            }
            when 'I' {
                @result.push: "<:" ~ $key ~ "<" ~ $chr.uniprop($key) ~ ">>";
            }
            when 'B' {
                @result.push: ($chr.uniprop($key) ?? "<:$key>" !! "<:!$key>");
            }

        }
    }
    @result;

}
sub prefs {
    my %prefs = nqp::hash(
          'Other_Grapheme_Extend','B','Titlecase_Mapping','tc','Dash','B',
          'Emoji_Modifier_Base','B','Emoji_Modifier','B','Pattern_Syntax','B',
          'IDS_Trinary_Operator','B','ID_Continue','B','Diacritic','B','Cased','B',
          'Hangul_Syllable_Type','S','Quotation_Mark','B','Radical','B',
          'NFD_Quick_Check','S','Joining_Type','S','Case_Folding','S','Script','S',
          'Soft_Dotted','B','Changes_When_Casemapped','B','Simple_Case_Folding','S',
          'ISO_Comment','S','Lowercase','B','Join_Control','B','Bidi_Class','S',
          'Joining_Group','S','Decomposition_Mapping','S','Lowercase_Mapping','lc',
          'NFKC_Casefold','S','Simple_Lowercase_Mapping','S',
          'Indic_Syllabic_Category','S','Expands_On_NFC','B','Expands_On_NFD','B',
          'Uppercase','B','White_Space','B','Sentence_Terminal','B',
          'NFKD_Quick_Check','S','Changes_When_Titlecased','B','Math','B',
          'Uppercase_Mapping','uc','NFKC_Quick_Check','S','Sentence_Break','S',
          'Simple_Titlecase_Mapping','S','Alphabetic','B','Composition_Exclusion','B',
          'Noncharacter_Code_Point','B','Other_Alphabetic','B','XID_Continue','B',
          'Age','S','Other_ID_Start','B','Unified_Ideograph','B','FC_NFKC_Closure','S',
          'Case_Ignorable','B','Hyphen','B','Numeric_Value','nv',
          'Changes_When_NFKC_Casefolded','B','Expands_On_NFKD','B',
          'Indic_Positional_Category','S','Decomposition_Type','S','Bidi_Mirrored','B',
          'Changes_When_Uppercased','B','ID_Start','B','Grapheme_Extend','B',
          'XID_Start','B','Expands_On_NFKC','B','Other_Uppercase','B','Other_Math','B',
          'Grapheme_Link','B','Bidi_Control','B','Default_Ignorable_Code_Point','B',
          'Changes_When_Casefolded','B','Word_Break','S','NFC_Quick_Check','S',
          'Other_Default_Ignorable_Code_Point','B','Logical_Order_Exception','B',
          'Prepended_Concatenation_Mark','B','Other_Lowercase','B',
          'Other_ID_Continue','B','Variation_Selector','B','Extender','B',
          'Full_Composition_Exclusion','B','IDS_Binary_Operator','B','Numeric_Type','S',
          'kCompatibilityVariant','S','Simple_Uppercase_Mapping','S',
          'Terminal_Punctuation','B','Line_Break','S','East_Asian_Width','S',
          'ASCII_Hex_Digit','B','Pattern_White_Space','B','Hex_Digit','B',
          'Bidi_Paired_Bracket_Type','S','General_Category','S',
          'Grapheme_Cluster_Break','S','Grapheme_Base','B','Name','na','Ideographic','B',
          'Block','S','Emoji_Presentation','B','Emoji','B','Deprecated','B',
          'Changes_When_Lowercased','B','Bidi_Mirroring_Glyph','bmg',
          'Canonical_Combining_Class','S',
    );
}
like image 171
Samantha M. Avatar answered Nov 18 '22 03:11

Samantha M.


OK, so here's another take on answering this question, but the solution is not perfect. Bring the downvotes!

If you join #perl6 channel on freenode, there's a bot called unicodable6 which has functionality that you may find useful. You can ask it to do this (e.g. for character A and π simultaneously):

<AlexDaniel> propdump: Aπ
<unicodable6> AlexDaniel, https://gist.github.com/b48e6062f3b0d5721a5988f067259727

Not only it shows the value of each property, but if you give it more than one character it will also highlight the differences!

Yes, it seems like you're looking for a way to do that within perl 6, and this answer is not it. But in the meantime it's very useful. Internally Unicodable just iterates through this list of properties. So basically this is identical to the other answer in this thread.

I think someone can make a module out of this (hint-hint), and then the answer to your question will be “just use module Unicode::Propdump”.

like image 42
Aleks-Daniel Jakimenko-A. Avatar answered Nov 18 '22 02:11

Aleks-Daniel Jakimenko-A.