Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accented/umlauted characters in C?

I'm just learning about C and got an assignment where we have to translate plain text into morse code and back. (I am mostly familiar with Java so bear with me on the terms I use).

To do this, I have an array with the strings for all letters.

char *letters[] = {
".- ", "-... ", "-.-. ", "-.. ", ".", "..-." etc

I wrote a function for returning the position of the desired letter.

int letter_nr(unsigned char c)
{
    return c-97;
}

This is working, but the assignment specifications require the handling of the Swedish umlauted letters åäö. The Swedish alphabet is the same as the English with these three letters in the end. I tried checking for these, like so:

int letter_nr(unsigned char c)
{
    if (c == 'å')
        return 26;
    if (c == 'ä')
        return 27;
    if (c == 'ö')
        return 28;
    return c-97;
}

Unfortunately, when I tried testing this function, I get the same value for all of these three: 98. Here is my main, testing function:

int main()
{   
    unsigned char letter;

    while(1)
    {
        printf("Type a letter to get its position: ");
        scanf("%c", &letter);
        printf("%d\n", letter_nr(letter));
    }
    return 0;
}

What can I do to resolve this?

like image 870
pg-robban Avatar asked Dec 18 '22 04:12

pg-robban


1 Answers

The encoding of character constants actually depend on your locale settings.

The safest bet is to use wide characters, and the corresponding functions. You declare the alphabet as const wchar_t* alphabet = L"abcdefghijklmnopqrstuvwxyzäöå", and the individual characters as L'ö';

This small example program works for me (also on a UNIX console with UTF-8) - try it.

#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(int argc, char** argv)
{
    wint_t letter = L'\0';
    setlocale(LC_ALL, ""); /* Initialize locale, to get the correct conversion to/from wchars */
    while(1)
    {
        if(!letter)
            printf("Type a letter to get its position: ");

        letter = fgetwc(stdin);
        if(letter == WEOF) {
        putchar('\n');
        return 0;
        } else if(letter == L'\n' || letter == L'\r') { 
        letter = L'\0'; /* skip newlines - and print the instruction again*/
        } else {
        printf("%d\n", letter); /* print the character value, and don't print the instruction again */
        }
    }
    return 0;
}

Example session:

Type a letter to get its position: a
97
Type a letter to get its position: A
65
Type a letter to get its position: Ö
214
Type a letter to get its position: ö
246
Type a letter to get its position: Å
197
Type a letter to get its position: <^D>

I understand that on Windows, this does not work with characters outside the Unicode BMP, but that's not an issue here.

like image 199
gnud Avatar answered Jan 02 '23 21:01

gnud