Before you get started; yes I know this is a duplicate question and yes I have looked at the posted solutions. My problem is I could not get them to work.
bool invalidChar (char c)
{
return !isprint((unsigned)c);
}
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());
}
I tested this method on "Prusæus, Ægyptians," and it did nothing
I also attempted to substitute isprint
for isalnum
The real problem occurs when, in another section of my program I convert string->wstring->string. the conversion balks if there are unicode chars in the string->wstring conversion.
Ref:
How can you strip non-ASCII characters from a string? (in C#)
How to strip all non alphanumeric characters from a string in c++?
Edit:
I still would like to remove all non-ASCII chars regardless yet if it helps, here is where I am crashing:
// Convert to wstring
wchar_t* UnicodeTextBuffer = new wchar_t[ANSIWord.length()+1];
wmemset(UnicodeTextBuffer, 0, ANSIWord.length()+1);
mbstowcs(UnicodeTextBuffer, ANSIWord.c_str(), ANSIWord.length());
wWord = UnicodeTextBuffer; //CRASH
Error Dialog
MSVC++ Debug Library
Debug Assertion Failed!
Program: //myproject
File: f:\dd\vctools\crt_bld\self_x86\crt\src\isctype.c
Line: //Above
Expression:(unsigned)(c+1)<=256
Edit:
Further compounding the matter: the .txt file I am reading in from is ANSI encoded. Everything within should be valid.
Solution:
bool invalidChar (char c)
{
return !(c>=0 && c <128);
}
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());
}
If someone else would like to copy/paste this, I can check this question off.
EDIT:
For future reference: try using the __isascii, iswascii commands
Solution:
bool invalidChar (char c)
{
return !(c>=0 && c <128);
}
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());
}
EDIT:
For future reference: try using the __isascii, iswascii commands
At least one problem is in your invalidChar
function. It should be:
return !isprint( static_cast<unsigned char>( c ) );
Casting a char
to an unsigned
is likely to give some very, very big
values if the char
is negative (UNIT_MAX+1 + c). Passing such a
value to
isprint` is undefined behavior.
Another solution that doesn't require defining two functions but uses anonymous functions available in C++17 above:
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), [](char c){return !(c>=0 && c <128);}), str.end());
}
I think it looks cleaner
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With