I'm using VisualStudio2010 to do some regular expression matching via PCRE.
Let's say I have a pattern and a subject given in std::wstring
like this:
std::wstring subject = L"サービス内容";
std::wstring pattern = L"ス内";
As you can see, I try to locate Japanese strings and thus I need to take the unicode variant of PCRE, for example pcre16 or pcre32 with functions pcre16_exec
or pcre32_exec
etc.
Unfortunately, it does not work. My problem seems to be the conversion from wstring
to unsigned short or unsigned int (depends on pcre16 or pcre32). I tried a lot of functions (wcstombs_s, strings conversions with QString etc.) but without success. The result of the exec function never holds the correct values I expect. Im' not really sure what went wrong - pattern matching with ansi strings using simple pcre functions works fine. Here's a snippet:
pcre16 *re;
const char *error;
int erroffset;
int ovector[30]; //The reult of the matching
int subject_length;
int rc;
std::wstring subjectstr = L"サービス内容";
std::wstring patternstr = L"ス内";
subject_length = 6;
const unsigned short pattern = ....// string conversion from patternstr
const insigned short subject = ....// string conversion from subjectstr
re = pcre16_compile(&pattern, PCRE_UTF16, &error, &erroffset, NULL);
rc = pcre16_exec(re, NULL, &subject, subject_length, 0, 0, ovector, 30);
Can somebody please give me a working example about how to detect unicode patterns with PCRE or explain what went wrong? I become exasperated with myelf.
I found the solution here.
The key was a very simple casting from wchar to const unsigned short (PCRE_SPTR16). My mind always have tried to use more complicated conversions.... In a nutshell, here's a working example for anybody might be interested. The results of the pattern matching can be found in subStrVec:
pcre16 *reCompiled;
int pcreExecRet;
int subStrVec[30];
const char *pcreErrorStr;
int pcreErrorOffset;
std::wstring pattern = L"容内容";
std::wstring subject = L"容容容内容容容";
const wchar_t* aStrRegex = pattern.c_str();
const wchar_t* line = subject.c_str();
reCompiled = pcre16_compile((PCRE_SPTR16)aStrRegex, PCRE_UTF8, &pcreErrorStr, &pcreErrorOffset, NULL);
pcreExecRet = pcre16_exec(reCompiled, NULL, (PCRE_SPTR16)line, wcslen(line), 0, 0, subStrVec, 30);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With