Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

validate the entry of an ASCII character

Tags:

c

char

ascii

I have a homework problem. I have to validate the entry of uppercase characters, but am having a problem with the A to Z.

I just put a while (c<65 || c>90) and it works fine. But, in my country we, use Ñ too, so that is my problem. I tried to use the ascii code 165 to validate the entry but it didn't work.

The char range is from -128 to 127, so for the extended ASCII table I need an unsigned char right?

i tried this:

int main (){
    unsinged char n;

    //scanf("%c",&n);
    printf("%c",n);
    return 0;
}

Print the 165 if it scans a 'Ñ'.

The next one:

unsigned char n;
n='Ñ';
printf("%d",n);

pPrints 209.

So I try to validate with 165 and 209 and neither works.

Why does this happen? What can I do to validate the entry of this character?


its works when i use unsigned char and validate with 165. But when i used cmd to try it by reading a txt file, didn't work...

like image 218
exsnake Avatar asked Sep 02 '14 23:09

exsnake


1 Answers

print the 165 if i scan a 'Ñ'.

  • This means that in your system the character 'Ñ' has code equal to 165, as in the usual extension ISO 8859-1 extension of ASCII.

    printf("%d",'Ñ');
    

print 209.

  • This reveals a different encoding for the characters you enter manually in your IDE.
    Mark Tolonen has suggested that it corresponds to OEM cp437.
    (I originally associated to UTF-8 by I'm a little confused now...)

IN C you have to take in account the existence of two collating sequence for characters, that could be different:

  1. The source character set.
  2. The execution character set.

The source character set is referred to the encoding used by your editing environment, that is, the place where you normally type your .c files. Your system and/or editor and/or IDE is working with a specific encoding-schema. In this case, it seems that the encoding is UTF-8.

Thus, if you write 'Ñ' in your editor, the character Ñ has the encoding of your editor, and has not the encoding of the target system. In this case you have Ñ encoded as 209, which gives you 'Ñ' == 209 as true.

The execution character set is referred to the encoding using in the operative system and/or the console you are using to run your executable (that is, compiled) programs. It seems clear that the encoding is Latin 1 (ISO-8859-1).

In particular, when you type Ñ in the console of your system, it's encoded as 165, which gives you the value 165 when you print the value.

Since this dichotomy always can happen (or not), you must be warried about that, and make some adjustments, to avoid potential problems.

its works when i use unsigned char and validate with 165. But when i used cmd to try it by reading a txt file, didn't work...

  • This means that your .txt file has been written with a text editor (perhaps your own IDE, I guess), that is using an encoding different to Latin 1 (ISO-8859-1).

Let me guess: You are writting your C code and your text files with the same IDE, but you are executing programs from the Windows CMD.

There are two possible solutions here.

The complicated solution is that you investigate about encoding schemas, locale issues, and wide characters. There is not quick solutions here, because it needs to be careful about several delicate stuff.

The easy solution is to make adjustments in all the tools you are using.

  1. Go to the options of your IDE and try to obtain the information of the encoding schema used to save text files (I guessed you have UTF-8, but you can have there other possibilities, like LATIN 1 (or ISO-8859-1), UTF-16 and a large etc.):
  2. Execute in your CMD the command CHCP to obtain the codepage number that your system is using. This codepage is a number whose meaning is explained my Microsoft, here:

    a. OEM codepages
    b. Windows codepages
    c. ISO codepages
    d. LIST OF ALL WINDOWS CODEPAGES

    I guess you have codepage 850 or well 28591 (corresponding to Latin 1).

  3. Change one of these configurations to fit with the other one.

    a. In the configuration of your IDE, in the "editor options" part, you could change the encoding to something like Latin 1, or ISO-8859-1.

    b. Or well, better change the codepage in your CMD, by means of the CHCP command, to fit OEM 437 encoding:

    CHCP 437

Probably the solution involving the change of codepage in CMD not always work as one expected.
It's safer the solution (a.): to modify the configuration of your editor.
However, it would be prefirable to keep the UTF-8 in your editor (if this is your editor's choice), because nowadays every modern software is turning to UTF encodings (Unicode).

New info: The UTF-8 encoding sometimes uses more than 1 byte to represent 1 character. The following table shows the UTF-8 encoding for the first 256 entry points:

  • UTF-8 for U+0000 to U+00FF

Note: After a little discussion in the comments, I realized that I had some wrong believes about UTF-8 encoding. At least, this illustrate my point: encoding is not a trivial matter.

So, I have to repeat here my advice to the OP: go by the simplest path and try to achieve to an agreement with your teacher about how to handle encoding for special characters.

like image 116
pablo1977 Avatar answered Sep 28 '22 08:09

pablo1977