The Problem: There is a method with a corresponding test-case that works on one machine and fails on the other (details below). I assume there's something wrong with the code, causing it to work by chance on the one machine. Unfortunately I cannot find the problem.
Please note that the usage of std::string and utf-8 encoding are requirements I have no real influence on. Using C++ methods would be totally fine, but unfortunately I failed to find anything. Hence the use of C-functions.
The method:
std::string firstCharToUpperUtf8(const string& orig) {
std::string retVal;
retVal.reserve(orig.size());
std::mbstate_t state = std::mbstate_t();
char buf[MB_CUR_MAX + 1];
size_t i = 0;
if (orig.size() > 0) {
if (orig[i] > 0) {
retVal += toupper(orig[i]);
++i;
} else {
wchar_t wChar;
int len = mbrtowc(&wChar, &orig[i], MB_CUR_MAX, &state);
// If this assertion fails, there is an invalid multi-byte character.
// However, this usually means that the locale is not utf8.
// Note that the default locale is always C. Main classes need to set them
// To utf8, even if the system's default is utf8 already.
assert(len > 0 && len <= static_cast<int>(MB_CUR_MAX));
i += len;
int ret = wcrtomb(buf, towupper(wChar), &state);
assert(ret > 0 && ret <= static_cast<int>(MB_CUR_MAX));
buf[ret] = 0;
retVal += buf;
}
}
for (; i < orig.size(); ++i) {
retVal += orig[i];
}
return retVal;
}
The test:
TEST(StringUtilsTest, firstCharToUpperUtf8) {
setlocale(LC_CTYPE, "en_US.utf8");
ASSERT_EQ("Foo", firstCharToUpperUtf8("foo"));
ASSERT_EQ("Foo", firstCharToUpperUtf8("Foo"));
ASSERT_EQ("#foo", firstCharToUpperUtf8("#foo"));
ASSERT_EQ("ßfoo", firstCharToUpperUtf8("ßfoo"));
ASSERT_EQ("Éfoo", firstCharToUpperUtf8("éfoo"));
ASSERT_EQ("Éfoo", firstCharToUpperUtf8("Éfoo"));
}
The failed test (only happens on one of two machines):
Failure
Value of: firstCharToUpperUtf8("ßfoo")
Actual: "\xE1\xBA\x9E" "foo"
Expected: "ßfoo"
Both machine have the locale en_US.utf8 installed. They however use different versions of libc. It works on the machine with GLIBC_2.14 independent of where it was compiled and doesn't work on the other machine, while it can only be compiled there, because otherwise it lacks the proper libc version.
Either way, there is a machine that compiles this code and runs it while it fails. There has to be something wrong with the code and I wonder what. Pointing to C++ methods (STL in particular), would also be great. Boost and other libraries should be avoided due to other outside requirements.
Most C string library routines still work with UTF-8, since they only scan for terminating NUL characters.
To use a keyboard shortcut to change between lowercase, UPPERCASE, and Capitalize Each Word, select the text and press SHIFT + F3 until the case you want is applied.
Changing the case on a Google ChromebookPress and hold either the left or right Shift and while continuing to hold the Shift key press the letter you want caps. Using the Shift key is the most common method of creating a capital letter on a computer.
Maybe someone would use it (maybe for tests)
With this you could make simple converter :) No additional libs :)
http://pastebin.com/fuw4Uizk
1482 letters
Example
Ь <> ь
Э <> э
Ю <> ю
Я <> я
Ѡ <> ѡ
Ѣ <> ѣ
Ѥ <> ѥ
Ѧ <> ѧ
Ѩ <> ѩ
Ѫ <> ѫ
Ѭ <> ѭ
Ѯ <> ѯ
Ѱ <> ѱ
Ѳ <> ѳ
Ѵ <> ѵ
Ѷ <> ѷ
Ѹ <> ѹ
Ѻ <> ѻ
Ѽ <> ѽ
Ѿ <> ѿ
Ҁ <> ҁ
Ҋ <> ҋ
Ҍ <> ҍ
Ҏ <> ҏ
Ґ <> ґ
Ғ <> ғ
Ҕ <> ҕ
Җ <> җ
Ҙ <> ҙ
Қ <> қ
Ҝ <> ҝ
Ҟ <> ҟ
Ҡ <> ҡ
Ң <> ң
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With