Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-8 string size in bytes

Tags:

c

utf-8

I need to determine the length of UTF-8 string in bytes in C. How to do it correctly? As I know, in UTF-8 terminal symbol has 1-byte size. Can I use strlen function for this?

like image 901
Ze.. Avatar asked Apr 22 '26 14:04

Ze..


1 Answers

Can I use strlen function for this?

Yes, strlen gives you the number of bytes before the first '\0' character, so

strlen(utf8) + 1

is the number of bytes in utf8 including the 0-terminator, since no character other than '\0' contains a 0 byte in UTF-8.

Of course, that only works if utf8 is actually UTF-8 encoded, otherwise you need to convert it to UTF-8 first.

like image 173
Daniel Fischer Avatar answered Apr 24 '26 05:04

Daniel Fischer



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!