Both Linux and the GNU userspace (glibc) seem to have a number of "WONTFIX" bugs, i.e. bugs which the responsible parties have declared their unwillingness to fix despite clearly violating the requirements of ISO C and/or POSIX, but I'm unaware of any resource for programmers which lists such bugs and suggestions for working around them.
Here are a few that come to mind:
select
bug: select
(and related interfaces) flag a UDP socket file descriptor ready for reading as soon as a packet has been received, without confirming the checksum. On subsequent recv
/read
/etc., if the checksum was invalid, the call will block. Working around this requires always setting UDP sockets to non-blocking mode and dealing with the EWOULDBLOCK
condition. If I remember correctly, MaraDNS was the first notable project affected by this bug and the first to complain (unsuccessfully) to have it fixed. Note: As pointed out by Martin v. Löwis, apparently this bug has since been fixed. Workarounds are probably only necessary if you need to support really outdated versions of Linux.printf
family in the GNU C library wrongly treats arguments to %s
as multibyte character strings instead of byte strings when a field precision (as in %.3s
) is specified, potentially causing truncated output. I know of no workaround except replacing the whole printf
subsystem (or simply not using the printf
family of functions with non-multibyte-character byte strings, but this can be problematic if you want to process legacy-codepage strings using snprintf
while in a UTF-8 locale).errno
result codes for certain syscalls (can't remember which ones right off). Usually these are easy enough to check for if you just read the GNU/Linux man pages and compare them to the standard.ENOTSUP
and EOPNOTSUP
having the same value; see PDTR 24715.What are some more bugs and workarounds we can add to this list? My goals in asking this question are:
I can't reproduce the printf issue that you claim. Running the program
#include <stdio.h>
#include <locale.h>
int main()
{
setlocale(LC_ALL, "");
printf("%.4s\n", "Löwis");
return 0;
}
in a de_DE.UTF-8 locale prints "Löw", which looks right to me: I asked for 4 bytes, and got four bytes (ö is 2 bytes). Had the library counted multi-byte characters, the output should have been "Löwi". This is with glibc 2.11.2.
Edit: Changing the string to "%.2s\n" will just print "L", i.e. only one byte. However, this is conforming to the specification, which says
If the precision is specified, no more than that many bytes shall be written.
(emphasis mine), and then
In no case shall a partial character be written.
So since printing two bytes (i.e. the L, and the lead byte of ö) would result in a partial character being written, it would be non-conforming to print incomplete UTF-8.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With