Detect if there is any non-ASCII character in a file path
I have a Unicode string with UTF-8 encoding that stores the file path, like, for instance, C:\Users\myUser\Downloads\ü.pdf. I have already checked that the string holds a correct file path in the local file system, but since I'm sending this string to a different process that supports only ASCII I need to figure out if that string contains any non-ASCII character.
How can I do that?
With grep -Pv '[\0-\x7f]' , you're asking for lines that don't ( -v ) contain an ASCII character. That's not the same thing as lines that contain a non-ASCII character. Just ask for that. Instead of a code point range, you could ask for non-printable characters in an ASCII locale.
You can download Notepad++ and open the file there. Then, go to the menu and select View->Show Symbol->Show All Characters . All characters will become visible, but you will have to scroll through the whole file to see which character needs to be removed.
An ASCII character uses only the lower 7 bits of a char
(values 0-127). A non-ASCII Unicode character encoded in UTF-8 uses char
elements that all have the upper bit set. So, you can simply iterate the char
elements seeing if any of them has a value above 127, eg:
bool containsOnlyASCII(const std::string& filePath) {
for (auto c: filePath) {
if (static_cast<unsigned char>(c) > 127) {
return false;
}
}
return true;
}
A note on the cast: std::string
contains char
elements. The standard doesn't define whether char
is signed
or unsigned
. If it's signed
, then we can cast it to unsigned
in a well-defined way. The standard specifies exactly how this is done.
As suggested by several comments and highlighted by @CrisLuengo answer, we can iterate the characters looking for any in the upper bit set (live example):
#include <iostream>
#include <string>
#include <algorithm>
bool isASCII (const std::string& s)
{
return !std::any_of(s.begin(), s.end(), [](char c) {
return static_cast<unsigned char>(c) > 127;
});
}
int main()
{
std::string s1 { "C:\\Users\\myUser\\Downloads\\Hello my friend.pdf" };
std::string s2 { "C:\\Users\\myUser\\Downloads\\ü.pdf" };
std::cout << std::boolalpha << isASCII(s1) << "\n";
std::cout << std::boolalpha << isASCII(s2) << "\n";
}
true
false
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With