I am trying to filter strings that escapes special characters and transforms it into lowercase. For example: <code>"Good morning!"</code> is transformed into <code>good morning</code>. I am passing one string at the time to my function. I am successfully filtering my strings that are in English language but I have problems when I am passing strings that are in my native language. What type of regex filter string should I use if I want to include all utf-8 characters? <pre class="prettyprint lang-cpp prettyprint-override"><code>#include <string> #include <iostream> #include <regex> #include <algorithm> std::string process(std::string s) { std::string st; std::regex r(R"([^\W_]+(?:['_-][^\W_]+)*)"); std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), r); std::smatch m = *i; st = m.str(); std::transform(st.begin(), st.end(), st.begin(), ::tolower); return st; } int main() { std::string st = "ąžuolas!"; std::cout << process(st) << std::endl; // <- gives: uolas return 0; } </code></pre>

You can match any unicode 'letter' character using the regex <code>\p{L}\p{M}*</code>. Therefore, the complete regex will be: <pre class="prettyprint"><code>((?:\p{L}\p{M}*)+(?:['_-](?:\p{L}\p{M}*)+)*) </code></pre> Demo Source

Filtering string using regex in utf8 format

Tags:

c++

regex

unicode

utf-8

c++14

I am trying to filter strings that escapes special characters and transforms it into lowercase. For example: "Good morning!" is transformed into good morning.
I am passing one string at the time to my function.
I am successfully filtering my strings that are in English language but I have problems when I am passing strings that are in my native language.
What type of regex filter string should I use if I want to include all utf-8 characters?

#include <string>
#include <iostream>
#include <regex>
#include <algorithm>

std::string process(std::string s) {
    std::string st;
    std::regex r(R"([^\W_]+(?:['_-][^\W_]+)*)");
    std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), r);
    std::smatch m = *i;
    st = m.str();
    std::transform(st.begin(), st.end(), st.begin(), ::tolower);
    return st;
}

int main() {
    std::string st = "ąžuolas!";
    std::cout << process(st) << std::endl; // <- gives: uolas
    return 0;
}

694

asked May 21 '19 07:05

dqmis

1 Answers

You can match any unicode 'letter' character using the regex \p{L}\p{M}*.

Therefore, the complete regex will be:

((?:\p{L}\p{M}*)+(?:['_-](?:\p{L}\p{M}*)+)*)

Demo

Source

110

answered Oct 17 '22 16:10

Anmol Singh Jaggi

Related questions
                            
                                Remove redundant template types
                            
                                How can I color QPainterPath subpaths differently?
                            
                                Name conflict between namespace and class template: different compiler behavior
                            
                                Why is const lost in this template structure?
                            
                                Adding a default constructor to a base class changes sizeof() a derived type [duplicate]
                            
                                Perfect forwarding with class template argument deduction
                            
                                Why does vkGetPhysicalDeviceMemoryProperties return multiple identical memory types?
                            
                                Convert Raw to Wav Streams in NodeJS
                            
                                VS Code does not show output from a program
                            
                                Inconsistent return from std::isblank between Visual C++ and gcc. Which one is wrong?
                            
                                How to enforce single threaded build in source code
                            
                                g++ and clang++ different behaviour with friend template function defined inside a template class
                            
                                Compound expression in if statement
                            
                                How to ignore QTapGesture after QTapAndHoldGesture
                            
                                How to use processor instructions in C++ to implement fast arithmetic operations
                            
                                Can a reinterpret_cast change the object representation?
                            
                                Why template with only valid empty variadic pack ill formed?
                            
                                Is there any real argument for getters/setters instead of public member variables in a simple Point class?
                            
                                Is there an authoritative way to guard against "use after move" mistakes in c++?
                            
                                Template class + delegating constructor = fields not initialized? (clang-tidy)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With