Boost regex: [:alpha:] and accented characters

Tags:

I am trying to replace every non alpha character in a string with " " using Boost:

std::string sanitize(std::string &str)
{
    boost::regex re;
    re.imbue(std::locale("fr_FR.UTF-8"));
    re.assign("[^[:alpha:]]");
    str = boost::regex_replace(str, re, " ");
    return str;
}


int main ()
{
    std::string test = "(ça) /.2424,@ va très bien ?";
    cout << sanitize(test) << endl;
    return 0;
}

The result is a va tr s bien but I would like to get ça va très bien.

What am I missing?

262

asked Feb 24 '14 13:02

Nicolas

1 Answers

boost::regex::imbue doesn't do what you are hoping for here - in particular, it won't make boost::regex work with UTF-8. (You could probably make it work this way with ISO 8859-1 or a similar single-byte character encoding, but that doesn't seem to be what you want here).

For UTF-8 support, you will need to use one of the boost::regex classes which will deal with Unicode - see http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/unicode.html.

Here is some code which I think does what you want:

#include <string>
#include <boost/regex/icu.hpp>

std::string sanitize(std::string &str)
{
    boost::u32regex re = boost::make_u32regex("[^[:alpha:]]");
    str = boost::u32regex_replace(str, re, " ");
    return str;
}


int main ()
{
    std::string test = "(ça) /.2424,@ va très bien ?";
    std::cout << test << "\n" << sanitize(test) << std::endl;
    return 0;
}

See http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/ref/non_std_strings/icu/unicode_algo.html for more examples.

187

answered Sep 30 '22 16:09

richvdh

Related questions
                            
                                Any better alternative to std::vector<std::unique_ptr<T>>?
                            
                                How to properly initialize global variables? [duplicate]
                            
                                FFmpeg decode raw buffer with avcodec_decode_video2
                            
                                boost.python: Argument types did not match C++ signature
                            
                                Why isn't the copy-constructor called when returning LOCAL variable
                            
                                How to use this manipulator
                            
                                MingW static library linking - SFML 2.1
                            
                                Cannot find C++ ATL Libraries (atl.lib and atl120.dll) in Visual Studio 2013
                            
                                Simulate clicked QML element from QTest
                            
                                Measuring performance of vector<unique_ptr> on VS2013?
                            
                                Qt framework - can I (legally) create a commercial app using qmake but not Qt?
                            
                                Redirecting CUDA printf to a C++ stream
                            
                                How to colorize the prompt of an editline application
                            
                                ZMQ C++ Req to Router issues
                            
                                Good way to debug stack value corruption
                            
                                std::map of objects or object pointers?
                            
                                C++ tools with the same functionality as Python's filter and map
                            
                                googletest: performing additional operation if test fail
                            
                                Why does ofstream::flush() return ostream?
                            
                                Understanding Boost.spirit's string parser

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Boost regex: [:alpha:] and accented characters

Tags:

c++

regex

boost

internationalization

Nicolas

People also ask

1 Answers

richvdh

Recent Activity

Donate For Us