I want to get a regex which can only match a string consisted of Chinese character and without English or any other character. [\u4e00-\u9fa5] doesn't work at all, and [^x00-xff] would match the situation with punctuate or other language character.
boost::wregex reg(L"\\w*");
bool b = boost::regex_match(L"我a", reg); // expected to be false
b = boost::regex_match(L"我,", reg); // expected to be false
b = boost::regex_match(L"我", reg); // expected to be true
How can I write a regex that matches only letters? Answer 1 Use a character set: [a-zA-Z] matches one letter from A–Z in lowercase and uppercase. [a-zA-Z]+ matches one or more letters and ^ [a-zA-Z]+$ matches only strings that consist of one or more letters only (^ and $ mark the begin and end of a string respectively).
Use a character set: [a-zA-Z] matches one letter from A–Z in lowercase and uppercase. [a-zA-Z]+ matches one or more letters and ^[a-zA-Z]+$ matches only strings that consist of one or more letters only (^ and $ mark the begin and end of a string respectively).
In regex, we can match any character using period ".". character. To match only a given set of characters, we should use character classes. 1. Match any character using regex. '.' character will match any character without regard to what character it is. The matched character can be an alphabet, number of any special character.
To match patterns with Chinese characters and other Unicode code points with a Flex-compatible lexical analyzer, you could use the RE/flex lexical analyzer for C++ that is backwards compatible with Flex. RE/flex supports Unicode and works with Bison to build lexers and parsers.
Boost with ICU can use character classes. I think you're looking for \p{Han}
script. Alternatively, U+4E00..U+9FFF is \p{InCJK_Unified_Ideographs}
The following regex works fine.
boost::wregex reg(L"^[\u4e00-\u9fa5]+");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With