Character classification

Tags:

The simple question again: having an std::string, determine which of its characters are digits, symbols, white spaces etc. with respect to the user's language and regional settings (locale).

I managed to split the string into a set of characters using the boost locale boundary analysis tool:

std::string text = u8"生きるか死ぬか";

boost::locale::boundary::segment_index<std::string::const_iterator> characters(
    boost::locale::boundary::character,
    text.begin(), text.end(),
    boost::locale::generator()("ja_JP.UTF-8"));

for (const auto& ch : characters) {
    // each 'ch' is a single character in japanese language
}

However, I further do not see any way to determine if ch is a digit or a symbol or anything else. There are boost string classification algorithms, but these don't seem to be working with.. whatever *segment_index::iterator is.

Nor I can apply std::isalpha(std::locale), because I'm unsure if it is possible to convert the boost segment into a char or wchar_t.

Is there any neat way to classify symbols?

373

asked Jun 30 '14 07:06

Ixanezis

1 Answers

There are a number of functions and objects supporting this in <locale> but... The example text you give looks like UTF-8, which is a multibyte encoding, and the functions in <locale> don't work with multibyte encodings.

I'd suggest you get the ICU library, and use it. Amongst other things, it allows testing for all of the properties defined in the Unicode Character Database. It also has macros or functions for iterating over a string (or at least an array of char), extracting one UTF_32 codepoint at a time (which is what you'd want to test).

answered Oct 11 '22 02:10

James Kanze

Related questions
                            
                                Autocomplete with C++ (NOT with Shell)
                            
                                Nested namespaces and ambiguous symbol
                            
                                Generating 3D models via primitive skinning
                            
                                Dangerous error Visual c++ 2005
                            
                                Standard type trait for the value of sizeof(T)
                            
                                Emitting an Event in Node.js C++ Addon
                            
                                How can I partially disable C4244
                            
                                C++: Map, previous item of a key
                            
                                Modulo of multiplication of large numbers [duplicate]
                            
                                Can I use a variable template to declare another variable template?
                            
                                Concept Based Polymorphism
                            
                                How to calculate struct padding in c++11 during compile time?
                            
                                Casting to reference in a template seems to cast away const-ness
                            
                                Why is "unused variable" warning not reported for all variables? [duplicate]
                            
                                Forward-declare a member enumeration of a class template
                            
                                Cannot release Mat object in Java
                            
                                Where to place libraries for emscripten and CMake
                            
                                Qt5 doesn't recognised shortcuts unless actions are added to a toolbar
                            
                                Using GNU Scientific Library (GSL) to draw a 2D B-Spline path using unevenly spaced points
                            
                                What happened to the "real" Cassandra C++ library libcql?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Character classification

Tags:

c++

string

locale

boost

Ixanezis

People also ask

1 Answers

James Kanze

Recent Activity

Donate For Us