The text is stored in a <code>std::string</code>. If the text is 8-bit ASCII, then it is really easy: <pre class="prettyprint"><code>text.pop_back(); </code></pre> But what if it is UTF-8 text? As far as I know, there are no UTF-8 related functions in the standard library which I could use.

What you can do is pop off characters until you reach the leading byte of a code point. The leading byte of a code point in UTF8 is either of the pattern <code>0xxxxxxx</code> or <code>11xxxxxx</code>, and all non-leading bytes are of the form <code>10xxxxxx</code>. This means you can check the first and second bit to determine if you have a leading byte. <pre class="prettyprint"><code>bool is_leading_utf8_byte(char c) { auto first_bit_set = (c & 0x80) != 0; auto second_bit_set = (c & 0X40) != 0; return !first_bit_set || second_bit_set; } void pop_utf8(std::string& x) { while (!is_leading_utf8_byte(x.back())) x.pop_back(); x.pop_back(); } </code></pre> This of course does no error checking and assumes that your string is valid utf-8.

How to remove the last character of a UTF-8 string in C++?

The text is stored in a std::string.

If the text is 8-bit ASCII, then it is really easy:

text.pop_back();

But what if it is UTF-8 text?
As far as I know, there are no UTF-8 related functions in the standard library which I could use.

How can I remove last character from a string in C?

Every string in C ends with '\0'. So you need do this: int size = strlen(my_str); //Total size of string my_str[size-1] = '\0'; This way, you remove the last char.

How can I remove last character from STD string?

Use pop_back() Function to Remove Last Character From the String in C++ The pop_back() is a built-in function in C++ STL that removes the last element from a string. It simply deletes the last element and adjusts the length of the string accordingly.

How do you reference the last character of a string?

To get the last character of a string, use bracket notation to access the string at the last index, e.g. str[str. length - 1] . Indexes are zero-based, so the index of the last character in the string is str.

You really need a UTF-8 Library if you are going to work with UTF-8. However for this task I think something like this may suffice:

void pop_back_utf8(std::string& utf8)
{
    if(utf8.empty())
        return;

    auto cp = utf8.data() + utf8.size();
    while(--cp >= utf8.data() && ((*cp & 0b10000000) && !(*cp & 0b01000000))) {}
    if(cp >= utf8.data())
        utf8.resize(cp - utf8.data());
}

int main()
{
    std::string s = "κόσμε";

    while(!s.empty())
    {
        std::cout << s << '\n';
        pop_back_utf8(s);
    }
}

Output:

κόσμε
κόσμ
κόσ
κό
κ

It relies on the fact that UTF-8 Encoding has one start byte followed by several continuation bytes. Those continuation bytes can be detected using the provided bitwise operators.

What you can do is pop off characters until you reach the leading byte of a code point. The leading byte of a code point in UTF8 is either of the pattern 0xxxxxxx or 11xxxxxx, and all non-leading bytes are of the form 10xxxxxx. This means you can check the first and second bit to determine if you have a leading byte.

bool is_leading_utf8_byte(char c) {
    auto first_bit_set = (c & 0x80) != 0;
    auto second_bit_set = (c & 0X40) != 0;
    return !first_bit_set || second_bit_set;
}

void pop_utf8(std::string& x) {
    while (!is_leading_utf8_byte(x.back()))
        x.pop_back();
    x.pop_back();
}

This of course does no error checking and assumes that your string is valid utf-8.

How to remove the last character of a UTF-8 string in C++?

Tags:

c++

string

c++11

unicode

utf-8

Iter Ator

People also ask

2 Answers

Galik

Benjamin Lindley

Recent Activity

Donate For Us

How to remove the last character of a UTF-8 string in C++?

Tags:

c++

string

c++11

unicode

utf-8

Iter Ator

People also ask

2 Answers

Galik

Benjamin Lindley

Related questions

Recent Activity

Donate For Us