I have a string and I want to check if the content is in English or Hindi(My local language). I figured out that the unicode range for hindi character is from U0900-U097F.
What is the simplest way to find if the string has any characters in this range?
I can use std::string or Glib::ustring depending on whichever is convenient.
It can represent all 1,114,112 Unicode characters. Most C code that deals with strings on a byte-by-byte basis still works, since UTF-8 is fully compatible with 7-bit ASCII.
As far as I know, the standard C's char data type is ASCII, 1 byte (8 bits).
Check the length of the string and size in bytes. If both are equal then it ASCII. If size in bytes is larger than length of the string, then it contains UNICODE characters.
Unicode Character “C” (U+0043)
Here is how you do it with Glib::ustring :
using Glib::ustring;
ustring x("सहस"); // hindi string
bool is_hindi = false;
for (ustring::iterator i = x.begin(); i != x.end(); i ++)
if (*i >= 0x0900 && *i <= 0x097f)
is_hindi = true;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With