I ask a question to know the usage of "strxfrm" in C.
I know the function is to transform a string according to current locale configuration.
but I don't know what "transform" is, and how this function transforms.
For example, I tried a code like below in macOS:
#include <stdio.h>
#include <string.h>
#include <locale.h>
int main(int argc, char * argv[])
{
char str1[512] = { 0x68, 0x6c, 0x61, 0x76, 0x61, 0x00 }; //"hlava";
char str2[512] = { 0xc4, 0x8d, 0xc3, 0xad, 0xc5, 0xa1, 0x6e, 0xc3, 0xad, 0x6b, 0x00 }; //"číšník";
char xfm1[512] = { '\0', };
char xfm2[512] = { '\0', };
char * result = NULL;
size_t lxfm1 = 0;
size_t lxfm2 = 0;
result = setlocale(LC_ALL, "en_US.UTF-8");
lxfm1 = strxfrm(xfm1, str1, sizeof xfm1);
lxfm2 = strxfrm(xfm2, str2, sizeof xfm2);
printf("<en-US>\n");
printf("setlocale = \"%s\"\n", (result == NULL) ? "NULL" : result);
printf("str1: \"%s\" --> \"%s\"\n", str1, xfm1);
printf("str2: \"%s\" --> \"%s\"\n", str2, xfm2);
printf("strcmp(str1, str2) = %d\n", strcmp(str1, str2));
printf("strcmp(xfm1, xfm2) = %d\n", strcmp(xfm1, xfm2));
printf("strcoll(xfm1, xfm2) = %d\n", strcoll(str1, str2));
printf("returns of strxfrm: %zu / %zu\n", lxfm1, lxfm2);
result = setlocale(LC_ALL, "cs_CZ.UTF-8");
lxfm1 = strxfrm(xfm1, str1, sizeof xfm1);
lxfm2 = strxfrm(xfm2, str2, sizeof xfm2);
printf("<cs-CZ>\n");
printf("setlocale = \"%s\"\n", result);
printf("str1: \"%s\" --> \"%s\"\n", str1, xfm1);
printf("str2: \"%s\" --> \"%s\"\n", str2, xfm2);
printf("strcmp(str1, str2) = %d\n", strcmp(str1, str2));
printf("strcmp(xfm1, xfm2) = %d\n", strcmp(xfm1, xfm2));
printf("strcoll(xfm1, xfm2) = %d\n", strcoll(str1, str2));
printf("returns of strxfrm: %zu / %zu\n", lxfm1, lxfm2);
return 0;
}
I expected that the result of "strcmp(xfm1, xfm2)" would be positive integer because the character 'č' precedes 'h' in czech language.
However, the result is...
<en-US>
setlocale = "en_US.UTF-8"
str1: "hlava" --> "001Z001^001S001h001S0000001Z001^001S001h001S"
str2: "číšník" --> "0042003_0042001`003_001]0000008?003_009S001`003_001]"
strcmp(str1, str2) = -92
strcmp(xfm1, xfm2) = -3
strcoll(xfm1, xfm2) = -152
returns of strxfrm: 44 / 52
<cs-CZ>
setlocale = "cs_CZ.UTF-8"
str1: "hlava" --> "001Z001^001S001h001S0000001Z001^001S001h001S"
str2: "číšník" --> "0042003_0042001`003_001]0000008?003_009S001`003_001]"
strcmp(str1, str2) = -92
strcmp(xfm1, xfm2) = -3
strcoll(xfm1, xfm2) = -152
returns of strxfrm: 44 / 52
Am I misunderstanding about this function 'strxfrm'? Actually, I don't know the meaning of 'transform' clearly even now.
please let me know the right usage and purpose of the function.
strxfrm() is a C/C++ Library function. It is used to transform the characters of the source string into the current locale and place them in the destination string. It is defined in the <locale. h> header file in C.
This function compares the string pointed to by str1 with the one pointed by str2.The strcoll() function performs the comparison based on the rules of the current locale's LC_COLLATE category. Syntax: int strcoll(const char *str1, const char *str2)
Your usage of strxfrm
is correct. The problem lies in the Mac OS X (and FreeBSD) locales implementation. It simply doesn't work properly with UTF-8. It's apparently a long standing bug/defect/inconsistency/quirk/whatever in the version of libc
these operating systems use.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With