So, I was trying to print fixtures from a competition with strings well formatted but i found out that whenever there is a special character like 'é' or 'í' or 'á' it would print +1 size even though i specified the max length.
Where is the code:
printf("=> %-25s (%d) vs (%d) \t%-25s\n", f->home_team_name, f->goals_home_team, f->goals_away_team, f->away_team_name);
For teams with those characters the output is like:
=> Palmeiras (2) vs (0) Botafogo
=> Atlético Mineiro (4) vs (3) Grémio
=> Atlético PR (3) vs (0) Palmeiras
=> Botafogo (2) vs (2) Cruzeiro
But i want the output to look like, even with special characters:
=> Tottenham Hotspur FC (0) vs (0) Leicester City FC
=> West Ham United FC (0) vs (0) Everton FC
=> Burnley FC (0) vs (0) AFC Bournemouth
I've tried to look for formatting flags but can't find the solution.
The format string in printf
does not take multibyte characters into account.
A possible solution is to count wide characters of a string by mbstowcs
function. The obtained count is then subtracted from the length (i.e. in bytes) of the examined string. This yields a (nonnegative) "compensation value", that may be added to printf
's format field width.
The mbstowcs
function is described as:
Converts a multibyte character string from the array whose first element is pointed to by
src
to its wide character representation. Converted characters are stored in the successive elements of the array pointed to bydst
. No more thanlen
wide characters are written to the destination array.
In your case, this means that UTF-8 encoded octets (represented within array of char
) are converted into some wide representation, that guarantees that any multibyte character (up to locale-specific MB_CUR_MAX
bytes) can be encoded by no more than one wchar_t
object.
The relevant quote from C11 Standard is contained in 7.19/2 Common definitions <stddef.h>
:
wchar_t
which is an integer type whose range of values can represent distinct codes for all members of the largest extended character set specified among the supported locales;
For instance, on Linux platform, wide characters are most likely to be represented in UCS-4 (known as UTF-32).
Here is a proof of concept:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <locale.h>
static inline size_t widestrlen(const char *str)
{
return mbstowcs(NULL, str, strlen(str));
}
static inline size_t compensation(const char *str)
{
return strlen(str) - widestrlen(str);
}
int main(void)
{
setlocale(LC_CTYPE, "");
// Print some debugging information regarding selected locale
printf("Current locale for LC_TYPE category: %s\n", setlocale(LC_CTYPE, NULL));
printf("Maximum number of bytes in a multibyte character: %zu\n", MB_CUR_MAX);
printf("Does current encoding support shift states? : %s\n\n", mblen(NULL, 0) ? "Yes" : "No");
int goals_home_teams[] = { 4, 0 };
int goals_away_teams[] = { 3, 0 };
const char *home_team_names[] = { "Atlético Mineiro", "West Ham United FC" };
const char *away_team_names[] = { "Grémio", "Everton FC" };
for (int i = 0; i < 2; i++)
{
printf("=> %-*s (%d) vs (%d) \t%-*s\n",
25 + (int) compensation(home_team_names[i]),
home_team_names[i], goals_home_teams[i], goals_away_teams[i],
25 + (int) compensation(away_team_names[i]),
away_team_names[i]);
}
return 0;
}
Result:
Current locale for LC_TYPE category: en_US.UTF-8
Maximum number of bytes in a multibyte character: 6
Does current encoding support shift states? : No
=> Atlético Mineiro (4) vs (3) Grémio
=> West Ham United FC (0) vs (0) Everton FC
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With