Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

printf format for 1 byte signed number

Tags:

c

printf

Assuming the following:

sizeof(char) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 8

The printf format for a 2 byte signed number is %hd, for a 4 byte signed number is %d, for an 8 byte signed number is %ld, but what is the correct format for a 1 byte signed number?

like image 282
CHRIS Avatar asked Feb 07 '15 21:02

CHRIS


People also ask

Is %d signed or unsigned?

5 Answers. Show activity on this post. The %d format prints out an int , and %u prints out an unsigned int . All arithmetic on unsigned char values is done by first casting them to int and doing the operations on int values, and so ~c (which is equal to -1 - (int)c ) will return a negative int value.

How do I printf a byte?

Assumption:You want to print the value of a variable of 1 byte width, i.e., char . In case you have a char variable say, char x = 0; and want to print the value, use %hhx format specifier with printf() . printf("%x", x);

What is %d %f %s in c?

%s refers to a string %d refers to an integer %c refers to a character. Therefore: %s%d%s%c\n prints the string "The first character in sting ", %d prints i, %s prints " is ", and %c prints str[0].


1 Answers

what is the correct format for a 1 byte signed number?

%hh and the integer conversion specifier of your choice (for example, %02hhX. See the C11 standard, §7.21.6.1p5:

hh

Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing);…

The parenthesized comment is important. Because of integer promotions on the arguments to variadic functions (such as printf), the function never sees a char argument. Many programmers think that that means that it is unnecessary to use h and hh qualifiers. Certainly, you are not creating undefined behaviour by leaving them out, and most of the time it will work.

However, char may well be signed, and the integer promotion will preserve its value, which will make it into a signed integer. Printing the signed integer out with an unsigned format (such as %02X) will present you with the sign-extended Fs. So if you want to display signed char using an unsigned format, you need to tell printf what the original unpromoted width of the integer type was, using hh.

In case that wasn't clear, a simple example (but controversial) example:

/* Read the comments thread to this post; I'll remove
   this note when I edit the outcome of the discussion into
   the answer
 */

#include <stdio.h>
int main(void) {
  char* s = "\u00d1"; /* Ñ */
  for (char* p = s; *p; ++p) printf("%02X (%02hhX)\n", *p, *p);
  return 0;
}

Output:

$ ./a.out
FFFFFFC3 (C3)
FFFFFF91 (91)

In the comment thread, there is (or possibly was) considerable discussion about whether the above snippet is undefined behaviour because the X format specification requires an unsigned argument, whereas the char argument is (at least on the implementation which produced the presented output) signed. I think this argument relies on §7.12.6.1/p9: "If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined."

However, in the case of char (and short) integer types, the expression in the argument list is promoted to int or unsigned int before the function is called. (It's worth noting that on most architectures, all three character types will be promoted to a signed int; promotion of an unsigned char (or an unsigned char) to an unsigned int will only happen on an implementation where sizeof(int) == 1.)

So on most architectures, the argument to an %hx or an %hhx format conversion will be signed, and that cannot be undefined behaviour without rendering the use of these format codes meaningless.

Furthermore, the standard does not say that fprintf (and friends) will somehow recover the original expression. What it says is that the value "shall be converted to signed char or unsigned char before printing" (§7.21.6.1/p5, quoted above, emphasis added).

Converting a signed value to an unsigned value is not undefined. It is not even unspecified or implementation-dependent. It simply consists of (conceptually) "repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type." (§6.3.1.3/p2)

So there is a well-defined procedure to convert the argument expression to a (possibly signed) int argument, and a well-defined procedure for converting that value to an unsigned char. I therefore argue that a program such as the one presented above is entirely well-defined.

For corroboration, the behaviour of fprintf given a format specifier %c is defined as follows (§7.21.6.8/p8), emphasis added:

the int argument is converted to an unsigned char, and the resulting character is written.

If one were to apply the proposed restrictive interpretation which renders the above program undefined, then I believe that one would be forced to also argue that:

void f(char c) {
  printf("This is a '%c'.\n", c);
}

was also UB. Yet, I think almost every C programmer has written something similar to that without thinking twice about it.

The key part of the question is what is meant by "argument" in §7.12.6.1/p9 (and other parts of §7.12.6.1). The C++ standard is slightly more precise; it specifies that if an argument is subject to the default argument promotions, "the value of the argument is converted to the promoted type before the call" which I interpret to mean that when considering the call (for example, the call of fprintf), the arguments are now the promoted values.

I don't think C is actually different, at least in intent. It uses wording like "the arguments&hellips; are promoted", and in at least one place "the argument after promotion". Furthermore, in the description of variadic functions (the va_arg macro, §7.16.1.1), the constraint on the argument type is annotated parenthetically "the type of the actual next argument (as promoted according to the default argument promotions)".

I'll freely agree that all of this is (a) subtle reading of insufficiently precise language, and (b) counting dancing angels. But I don't see any value in declaring that standard usages like the use of %c with char arguments are "technically" UB; that denatures the concept of UB and it is hard to believe that such a prohibition would be intentional, which leads me to believe that the interpretation was not intended. (And, perhaps, should be corrected editorially.)

like image 123
rici Avatar answered Oct 14 '22 12:10

rici