Test if char* string contains multibyte characters

Tags:

I receive a byte stream buffer from a TCP server which could contain multibyte characters forming unicode characters. I was wondering if there's always a way to check for BOM to detect those characters or else how would you like to do it?

522

asked Feb 16 '11 05:02

cpx

1 Answers

If you know that the data is UTF-8, then you just have to check the high bit:

0xxxxxxx = single-byte ASCII character
1xxxxxxx = part of multi-byte character

Or, if you need to distinguish lead/trail bytes:

10xxxxxx = 2nd, 3rd, or 4th byte of multi-byte character
110xxxxx = 1st byte of 2-byte character
1110xxxx = 1st byte of 3-byte character
11110xxx = 1st byte of 4-byte character

answered Sep 24 '22 11:09

dan04

Related questions
                            
                                I can't add a new line to c++ string
                            
                                Problem with Boost::Test
                            
                                Should names representing template types be a single character?
                            
                                How to use valgrind effectively
                            
                                What library do you use for matrix calculations on CUDA? [closed]
                            
                                C++ struct data member
                            
                                Specializing template for enum
                            
                                What is the set-like data structure in c++
                            
                                Functional programming in Python and C++ [closed]
                            
                                How many character can a STL string class can hold?
                            
                                Cannot open source file: 'WIN32': No such file or directory
                            
                                C++ Change Output From "cout"
                            
                                Lazy evaluation and problems with const correctness
                            
                                are programs coded separately for different operating systems?
                            
                                Why setting null in the middle of std string doesn't have any effect
                            
                                std::vector hex values
                            
                                (C++) Linking with namespaces causes duplicate symbol error
                            
                                Casting from string to void* and back
                            
                                #pragma deprecate a function based on signature?
                            
                                Generate random number with non-uniform density

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Test if char* string contains multibyte characters

Tags:

c++

unicode

multibyte

cpx

People also ask

1 Answers

dan04

Recent Activity

Donate For Us