The text is stored in a std::string
.
If the text is 8-bit ASCII, then it is really easy:
text.pop_back();
But what if it is UTF-8 text?
As far as I know, there are no UTF-8 related functions in the standard library which I could use.
Every string in C ends with '\0'. So you need do this: int size = strlen(my_str); //Total size of string my_str[size-1] = '\0'; This way, you remove the last char.
Use pop_back() Function to Remove Last Character From the String in C++ The pop_back() is a built-in function in C++ STL that removes the last element from a string. It simply deletes the last element and adjusts the length of the string accordingly.
To get the last character of a string, use bracket notation to access the string at the last index, e.g. str[str. length - 1] . Indexes are zero-based, so the index of the last character in the string is str.
You really need a UTF-8 Library if you are going to work with UTF-8
. However for this task I think something like this may suffice:
void pop_back_utf8(std::string& utf8)
{
if(utf8.empty())
return;
auto cp = utf8.data() + utf8.size();
while(--cp >= utf8.data() && ((*cp & 0b10000000) && !(*cp & 0b01000000))) {}
if(cp >= utf8.data())
utf8.resize(cp - utf8.data());
}
int main()
{
std::string s = "κόσμε";
while(!s.empty())
{
std::cout << s << '\n';
pop_back_utf8(s);
}
}
Output:
κόσμε
κόσμ
κόσ
κό
κ
It relies on the fact that UTF-8 Encoding has one start byte followed by several continuation bytes. Those continuation bytes can be detected using the provided bitwise operators.
What you can do is pop off characters until you reach the leading byte of a code point. The leading byte of a code point in UTF8 is either of the pattern 0xxxxxxx
or 11xxxxxx
, and all non-leading bytes are of the form 10xxxxxx
. This means you can check the first and second bit to determine if you have a leading byte.
bool is_leading_utf8_byte(char c) {
auto first_bit_set = (c & 0x80) != 0;
auto second_bit_set = (c & 0X40) != 0;
return !first_bit_set || second_bit_set;
}
void pop_utf8(std::string& x) {
while (!is_leading_utf8_byte(x.back()))
x.pop_back();
x.pop_back();
}
This of course does no error checking and assumes that your string is valid utf-8.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With