C: char vs. unsigned char for non-ASCII text data

Tags:

This question:

What is an unsigned char?

does a great job of discussing char vs. unsigned char vs. signed char in C.

However, it doesn't directly address what should be used for non-ASCII text. Thus if I have an array of bytes that represents text in some arbitrary character set like UTF-8 or Big5 (or sometimes ASCII), should I use an array of char or unsigned char?

I'm leaning towards using char because otherwise gcc gives me warnings about signedness of pointers when the array is ASCII and I use strlen. But I would like to know what is correct.

959

asked Oct 24 '14 03:10

Craig S. Anderson

2 Answers

Use normal char to represent characters. Use signed char when you want a signed integer type that covers values from -127 to +127 . Use unsigned char for having an unsigned integer type that has range of values from 0 to 255 .

answered Sep 20 '22 06:09

Dr. Debasish Jana

The question you are asking is probably much broader that you expect.

To answer it directly, most implementations use "byte" as underlying buffer. In that terms standard uint8_t typedef is your best bet. That is primarily because most character sets use variable number of bytes to store characters, so separate byte processing is essential in encoding and decoding process. It also simplifies conversion between different "endianess".

In general it's incorrect to use strlen on anything other than ASCII encoding or other single-byte code pages (0-255 range). It's certainly incorrect on any multi-byte encoding like Big5, UTF-8/16 or Shift-JIS.

answered Sep 18 '22 06:09

Petr Abdulin

Related questions
                            
                                Proper way to get groups of a user in linux using C
                            
                                Codeblocks takes long time to execute after compiling
                            
                                Does C round floating-point constants
                            
                                Scope and lifetime of local variables in C
                            
                                getenv Not Working for COLUMNS and LINES
                            
                                error while running make install - include/generated/autoconf.h or include/config/auto.conf are missing
                            
                                OpenGL obj loader in C
                            
                                How to get multiple inputs in one line in C?
                            
                                STM32 DMA transfer error
                            
                                Video Creation with Images and Audio in Android [closed]
                            
                                How to avoid TIME_WAIT for server sockets? [duplicate]
                            
                                Compacting data in buffer from 16 bit per element to 12 bits
                            
                                How to detect safe mode on OSX
                            
                                Reduce Context Switches Between Threads With Same Priority
                            
                                gcc intrinsic vs inline assembly : which is better?
                            
                                expression must have integral type
                            
                                Is strlen on a string with uninitialized values undefined behavior?
                            
                                Is there any difference between text and binary mode in file access?
                            
                                memory starting location in C [duplicate]
                            
                                Why isn't my char* passing correctly?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

C: char vs. unsigned char for non-ASCII text data

Tags:

c

string

unsigned-char

Craig S. Anderson

People also ask

2 Answers

Dr. Debasish Jana

Petr Abdulin

Recent Activity

Donate For Us