Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C characters over 128

I have a question regarding the saving of characters in C char arrays.

I must read text from a file into a array of type "char" (i cannot use unsigned char). When there are certain characters with a value over 127 (e.g. €, ä, ö, ...) it saves them as negative values, but they do often take more space (e.g. € takes 3 negative values).

How can I calculate these negative values back into unsigned characters. Could someone link me to a tutorial or a guide about that issue?

like image 333
Marco Avatar asked Jun 04 '26 18:06

Marco


2 Answers

I think you should read this: http://www.joelonsoftware.com/articles/Unicode.html

like image 179
duedl0r Avatar answered Jun 06 '26 11:06

duedl0r


This depends on encoding you use.

Conventional 1-byte encoding cause no problems. Yes, some characters are treated as negative values but they are stay being that characters they were when reading. If you write them back as is, they will be what they were.

Since you are sure you have 3 chars per euro symbol, you are dealing with some Unicode encoding, like UTF-8.

This means, that you should store them in some multibyte types like wchar_t. But this contradicting your requirement of storing data in char.

I suggest you to convert your file into 1-byte encoding first, for example to Win1252. This encoding has 1 byte for euro symbol.

If you wish to work with Unicode, I am afraid it is hard to deal with negative char. It is traditional to represent Unicode values with positive integers.

like image 44
Dims Avatar answered Jun 06 '26 11:06

Dims