Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to use strxfrm in C language?

Tags:

I ask a question to know the usage of "strxfrm" in C.

I know the function is to transform a string according to current locale configuration.

but I don't know what "transform" is, and how this function transforms.

For example, I tried a code like below in macOS:

#include <stdio.h>
#include <string.h>
#include <locale.h>

int main(int argc, char * argv[])
{
    char str1[512] = { 0x68, 0x6c, 0x61, 0x76, 0x61, 0x00 }; //"hlava";
    char str2[512] = { 0xc4, 0x8d, 0xc3, 0xad, 0xc5, 0xa1, 0x6e, 0xc3, 0xad, 0x6b, 0x00 }; //"číšník";
    char xfm1[512] = { '\0', };
    char xfm2[512] = { '\0', };
    char * result = NULL;
    size_t lxfm1 = 0;
    size_t lxfm2 = 0;

    result = setlocale(LC_ALL, "en_US.UTF-8");
    lxfm1 = strxfrm(xfm1, str1, sizeof xfm1);
    lxfm2 = strxfrm(xfm2, str2, sizeof xfm2);
    printf("<en-US>\n");
    printf("setlocale = \"%s\"\n", (result == NULL) ? "NULL" : result);
    printf("str1: \"%s\" --> \"%s\"\n", str1, xfm1);
    printf("str2: \"%s\" --> \"%s\"\n", str2, xfm2);
    printf("strcmp(str1, str2) = %d\n", strcmp(str1, str2));
    printf("strcmp(xfm1, xfm2) = %d\n", strcmp(xfm1, xfm2));
    printf("strcoll(xfm1, xfm2) = %d\n", strcoll(str1, str2));
    printf("returns of strxfrm: %zu / %zu\n", lxfm1, lxfm2);

    result = setlocale(LC_ALL, "cs_CZ.UTF-8");
    lxfm1 = strxfrm(xfm1, str1, sizeof xfm1);
    lxfm2 = strxfrm(xfm2, str2, sizeof xfm2);
    printf("<cs-CZ>\n");
    printf("setlocale = \"%s\"\n", result);
    printf("str1: \"%s\" --> \"%s\"\n", str1, xfm1);
    printf("str2: \"%s\" --> \"%s\"\n", str2, xfm2);
    printf("strcmp(str1, str2) = %d\n", strcmp(str1, str2));
    printf("strcmp(xfm1, xfm2) = %d\n", strcmp(xfm1, xfm2));
    printf("strcoll(xfm1, xfm2) = %d\n", strcoll(str1, str2));
    printf("returns of strxfrm: %zu / %zu\n", lxfm1, lxfm2);

    return 0;
}

I expected that the result of "strcmp(xfm1, xfm2)" would be positive integer because the character 'č' precedes 'h' in czech language.

However, the result is...

<en-US>
setlocale = "en_US.UTF-8"
str1: "hlava" --> "001Z001^001S001h001S0000001Z001^001S001h001S"
str2: "číšník" --> "0042003_0042001`003_001]0000008?003_009S001`003_001]"
strcmp(str1, str2) = -92
strcmp(xfm1, xfm2) = -3
strcoll(xfm1, xfm2) = -152
returns of strxfrm: 44 / 52
<cs-CZ>
setlocale = "cs_CZ.UTF-8"
str1: "hlava" --> "001Z001^001S001h001S0000001Z001^001S001h001S"
str2: "číšník" --> "0042003_0042001`003_001]0000008?003_009S001`003_001]"
strcmp(str1, str2) = -92
strcmp(xfm1, xfm2) = -3
strcoll(xfm1, xfm2) = -152
returns of strxfrm: 44 / 52

Am I misunderstanding about this function 'strxfrm'? Actually, I don't know the meaning of 'transform' clearly even now.

please let me know the right usage and purpose of the function.

like image 311
Luciano Jeong Avatar asked Aug 21 '18 06:08

Luciano Jeong


People also ask

What is Strxfrm in C?

strxfrm() is a C/C++ Library function. It is used to transform the characters of the source string into the current locale and place them in the destination string. It is defined in the <locale. h> header file in C.

What is Strcoll in C?

This function compares the string pointed to by str1 with the one pointed by str2.The strcoll() function performs the comparison based on the rules of the current locale's LC_COLLATE category. Syntax: int strcoll(const char *str1, const char *str2)


1 Answers

Your usage of strxfrm is correct. The problem lies in the Mac OS X (and FreeBSD) locales implementation. It simply doesn't work properly with UTF-8. It's apparently a long standing bug/defect/inconsistency/quirk/whatever in the version of libc these operating systems use.

like image 127
n. 1.8e9-where's-my-share m. Avatar answered Oct 04 '22 18:10

n. 1.8e9-where's-my-share m.