I have a statement where Chinese character and English character are next to each other:
我Love Perl 6哈哈
I want to insert a space between Chinese character and English character:
我 Love Perl 6 哈哈
I search that \u4e00-\u9fa5
represent Chinese character:
'哈' ~~ /<[\u4e00..\u9fa5]>/
but this result in:
Potential difficulties:
Repeated character (0) unexpectedly found in character class
at line 2
------> '哈' ~~ /<[\u4e00..\⏏u9fa5]>/
so how to match a Chinese character?
Unlike English (and other alphabetic writing systems), Chinese is written without spaces between successive characters and words.
In standard pinyin, you put the spaces in between words.
Chinese does not use a word separator. How a novice learner can detect the word's boundaries? You can check for verbs, adjectives, adverbs, nouns, conjunctions etc. that you know, there are some that are very common (eg 去 [v],好 [adj],很 [adv],昨天 [n].)
The main problem is that \u
is not a valid escape.
> "\u4e00"
===SORRY!=== Error while compiling:
Unrecognized backslash sequence: '\u'
------> "\⏏u4e00"
\x
is though.
> "\x4e00"
一
At any rate, the character class you are trying to use doesn't cover all Chinese characters.
> '㒠' ~~ /<[\x4e00..\x9fa5]>/
Nil
What you probably want is to match on a script.
> '㒠' ~~ /<:Han>/
「㒠」
This has the benefit that you don't have to keep changing your character class every time a new set of characters gets added to Unicode.
At any rate you could do any of the following
# store in $0 and $1
say S/(<:Han>)(<:Latin>)/$0 $1/ given '我Love Perl 6哈哈'
say S{(<:Han>)(<:Latin>)} = "$0 $1" given '我Love Perl 6哈哈'
# same with subst
say '我Love Perl 6哈哈'.subst: /(<:Han>)(<:Latin>)/, {"$0 $1"}
# only match between the two
say S/<:Han> <( )> <:Latin>/ / given '我Love Perl 6哈哈'
say S{<:Han> <( )> <:Latin>} = ' ' given '我Love Perl 6哈哈'
To change the value in a variable use s///
or .=subst
my $v = '我Love Perl 6哈哈';
$v ~~ s/(<:Han>)(<:Latin>)/$0 $1/;
$v ~~ s{(<:Han>)(<:Latin>)} = "$0 $1";
$v ~~ s/<:Han> <()> <:Latin>/ /;
$v .= subst: /(<:Han>)(<:Latin>)/, {"$0 $1"};
$v .= subst: /<:Han> <()> <:Latin>/,' ';
Note that <(
causes everything to be ignored before it, and )>
does the same for everything after it. (can be used individually).
You may want to use an inverted match instead for the character that is following.
S/<:Han> <( )> [ <!:Han> & <!space> ]/ /
(Match a character that is at the same time not Han and not a space.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With