If I want to use C++11's regular expressions with unicode strings, will they work with char* as UTF-8 or do I have to convert them to a wchar_t* string?
Most C string library routines still work with UTF-8, since they only scan for terminating NUL characters.
This will make your regular expressions work with all Unicode regex engines. In addition to the standard notation, \p{L}, Java, Perl, PCRE, the JGsoft engine, and XRegExp 3 allow you to use the shorthand \pL. The shorthand only works with single-letter Unicode properties.
UTF-8 actually works quite well in std::string . Most operations work out of the box because the UTF-8 encoding is self-synchronizing and backward compatible with ASCII.
So, yes, regular expressions really only apply to strings. If you want a more complicated FSM, then it's possible to write one, but not using your local regex engine.
You would need to test your compiler and the system you are using, but in theory, it will be supported if your system has a UTF-8 locale. The following test returned true for me on Clang/OS X.
bool test_unicode() { std::locale old; std::locale::global(std::locale("en_US.UTF-8")); std::regex pattern("[[:alpha:]]+", std::regex_constants::extended); bool result = std::regex_match(std::string("abcdéfg"), pattern); std::locale::global(old); return result; }
NOTE: This was compiled in a file what was UTF-8 encoded.
Just to be safe I also used a string with the explicit hex versions. It worked also.
bool test_unicode2() { std::locale old; std::locale::global(std::locale("en_US.UTF-8")); std::regex pattern("[[:alpha:]]+", std::regex_constants::extended); bool result = std::regex_match(std::string("abcd\xC3\xA9""fg"), pattern); std::locale::global(old); return result; }
Update test_unicode()
still works for me
$ file regex-test.cpp regex-test.cpp: UTF-8 Unicode c program text $ g++ --version Configured with: --prefix=/Applications/Xcode-8.2.1.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 8.0.0 (clang-800.0.42.1) Target: x86_64-apple-darwin15.6.0 Thread model: posix InstalledDir: /Applications/Xcode-8.2.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With