I am trying to learn C on my own and I'm kind of confused with <code>getchar</code> and <code>putchar</code>: <h3>1</h3> <pre class="prettyprint"><code>#include <stdio.h> int main(void) { char c; printf("Enter characters : "); while((c = getchar()) != EOF){ putchar(c); } return 0; } </code></pre> <h3>2</h3> <pre class="prettyprint"><code>#include <stdio.h> int main(void) { int c; printf("Enter characters : "); while((c = getchar()) != EOF){ putchar(c); } return 0; } </code></pre> The C library function <code>int putchar(int c)</code> writes a character (an unsigned char) specified by the argument char to stdout. The C library function <code>int getchar(void)</code> gets a character (an unsigned char) from stdin. This is equivalent to getc with stdin as its argument. Does it mean <code>putchar()</code> accepts both <code>int</code> and <code>char</code> or either of them and for <code>getchar()</code> should we use an <code>int</code> or <code>char</code>?

TL;DR: <ul> <li> <code>char c; c = getchar();</code> is wrong, broken and buggy.</li> <li> <code>int c; c = getchar();</code> is correct.</li> </ul> This applies to <code>getc</code> and <code>fgetc</code> as well, if not even more so, because one would often read until the end of the file. <hr> Always store the return value of <code>getchar</code> (<code>fgetc</code>, <code>getc</code>...) (and <code>putchar</code>) initially into a variable of type <code>int</code>. The argument to <code>putchar</code> can be any of <code>int</code>, <code>char</code>, <code>signed char</code> or <code>unsigned char</code>; its type doesn't matter, and all of them work the same, even though one might result in positive and other in negative integers being passed for characters above and including <code>\200</code> (128). <hr> The reason why you must use <code>int</code> to store the return value of both <code>getchar</code> and <code>putchar</code> is that when the end-of-file condition is reached (or an I/O error occurs), both of them return the value of the macro <code>EOF</code> which is a negative integer constant, (usually <code>-1</code>). For <code>getchar</code>, if the return value is not <code>EOF</code>, it is the read <code>unsigned char</code> zero-extended to an <code>int</code>. That is, assuming 8-bit characters, the values returned can be <code>0</code>...<code>255</code> or the value of the macro <code>EOF</code>; again assuming 8-bit char, there is no way to squeeze these 257 distinct values into 256 so that each of them could be identified uniquely. <hr> Now, if you stored it into <code>char</code> instead, the effect would depend on whether the character type is signed or unsigned by default! This varies from compiler to compiler, architecture to architecture. If <code>char</code> is signed and assuming <code>EOF</code> is defined as <code>-1</code>, then both <code>EOF</code> and character <code>'\377'</code> on input would compare equal to <code>EOF</code>; they'd be sign-extended to <code>(int)-1</code>. On the other hand, if <code>char</code> is unsigned (as it is by default on ARM processors, including Raspberry PI systems; and seems to be true for AIX too), there is no value that could be stored in <code>c</code> that would compare equal to <code>-1</code>; including <code>EOF</code>; instead of breaking out on <code>EOF</code>, your code would output a single <code>\377</code> character. The danger here is that with signed <code>char</code>s the code seems to be working correctly even though it is still horribly broken - one of the legal input values is interpreted as <code>EOF</code>. Furthermore, C89, C99, C11 does not mandate a value for <code>EOF</code>; it only says that <code>EOF</code> is a negative integer constant; thus instead of <code>-1</code> it could as well be say <code>-224</code> on a particular implementation, which would cause spaces behave like <code>EOF</code>. <code>gcc</code> has the switch <code>-funsigned-char</code> which can be used to make the <code>char</code> unsigned on those platforms where it defaults to signed: <pre class="prettyprint"><code>% cat test.c #include <stdio.h> int main(void) { char c; printf("Enter characters : "); while ((c = getchar()) != EOF){ putchar(c); } return 0; } </code></pre> Now we run it with signed <code>char</code>: <pre class="prettyprint"><code>% gcc test.c && ./a.out Enter characters : sfdasadfdsaf sfdasadfdsaf ^D % </code></pre> Seems to be working right. But with unsigned <code>char</code>: <pre class="prettyprint"><code>% gcc test.c -funsigned-char && ./a.out Enter characters : Hello world Hello world ��^C % </code></pre> That is, I tried to press <code>Ctrl-D</code> there many times but a <code>�</code> was printed for each <code>EOF</code> instead of breaking the loop. Now, again, for the signed <code>char</code> case, it cannot distinguish between <code>char</code> 255 and <code>EOF</code> on Linux, breaking it for binary data and such: <pre class="prettyprint"><code>% gcc test.c && echo -e 'Hello world\0377And some more' | ./a.out Enter characters : Hello world % </code></pre> Only the first part up to the <code>\0377</code> escape was written to stdout. <hr> Beware that comparisons between character constants and an <code>int</code> containing the unsigned character value might not work as expected (e.g. the character constant <code>'ä'</code> in ISO 8859-1 would mean the signed value <code>-28</code>. So assuming that you write code that would read input until <code>'ä'</code> in ISO 8859-1 codepage, you'd do <pre class="prettyprint"><code>int c; while ((c = getchar()) != EOF){ if (c == (unsigned char)'ä') { /* ... */ } } </code></pre> <hr> Due to integer promotion, all <code>char</code> values fit into an <code>int</code>, and are automatically promoted on function calls, thus you can give any of <code>int</code>, <code>char</code>, <code>signed char</code> or <code>unsigned char</code> to <code>putchar</code> as an argument (not to store its return value), and it would work as expected. The actual value passed in the integer might be positive or even negative; for example the character constant <code>\377</code> would be negative on a 8-bit-char system where <code>char</code> is signed; however <code>putchar</code> (or <code>fputc</code> actually) will convert the value to an unsigned char. C11 7.21.7.3p2: <blockquote> 2 The fputc function writes the character specified by <code>c</code> (converted to an unsigned char) to the output stream pointed to by stream [...] </blockquote> (emphasis mine) I.e. the <code>fputc</code> will be guaranteed to convert the given <code>c</code> as if by <code>(unsigned char)c</code>

Difference between int and char in getchar/fgetc and putchar/fputc?

Tags:

I am trying to learn C on my own and I'm kind of confused with getchar and putchar:

1

#include <stdio.h>  int main(void) {     char c;     printf("Enter characters : ");     while((c = getchar()) != EOF){       putchar(c);     }     return 0; }

2

#include <stdio.h>  int main(void) {     int c;     printf("Enter characters : ");     while((c = getchar()) != EOF){       putchar(c);     }     return 0; }

The C library function int putchar(int c) writes a character (an unsigned char) specified by the argument char to stdout.

The C library function int getchar(void) gets a character (an unsigned char) from stdin. This is equivalent to getc with stdin as its argument.

Does it mean putchar() accepts both int and char or either of them and for getchar() should we use an int or char?

428

asked Feb 12 '16 06:02

Raghib Hasan

1 Answers

TL;DR:

char c; c = getchar(); is wrong, broken and buggy.
int c; c = getchar(); is correct.

This applies to getc and fgetc as well, if not even more so, because one would often read until the end of the file.

Always store the return value of getchar (fgetc, getc...) (and putchar) initially into a variable of type int.

The argument to putchar can be any of int, char, signed char or unsigned char; its type doesn't matter, and all of them work the same, even though one might result in positive and other in negative integers being passed for characters above and including \200 (128).

The reason why you must use int to store the return value of both getchar and putchar is that when the end-of-file condition is reached (or an I/O error occurs), both of them return the value of the macro EOF which is a negative integer constant, (usually -1).

For getchar, if the return value is not EOF, it is the read unsigned char zero-extended to an int. That is, assuming 8-bit characters, the values returned can be 0...255 or the value of the macro EOF; again assuming 8-bit char, there is no way to squeeze these 257 distinct values into 256 so that each of them could be identified uniquely.

Now, if you stored it into char instead, the effect would depend on whether the character type is signed or unsigned by default! This varies from compiler to compiler, architecture to architecture. If char is signed and assuming EOF is defined as -1, then both EOF and character '\377' on input would compare equal to EOF; they'd be sign-extended to (int)-1.

On the other hand, if char is unsigned (as it is by default on ARM processors, including Raspberry PI systems; and seems to be true for AIX too), there is no value that could be stored in c that would compare equal to -1; including EOF; instead of breaking out on EOF, your code would output a single \377 character.

The danger here is that with signed chars the code seems to be working correctly even though it is still horribly broken - one of the legal input values is interpreted as EOF. Furthermore, C89, C99, C11 does not mandate a value for EOF; it only says that EOF is a negative integer constant; thus instead of -1 it could as well be say -224 on a particular implementation, which would cause spaces behave like EOF.

gcc has the switch -funsigned-char which can be used to make the char unsigned on those platforms where it defaults to signed:

% cat test.c #include <stdio.h>  int main(void) {     char c;     printf("Enter characters : ");     while ((c = getchar()) != EOF){       putchar(c);     }     return 0; }

Now we run it with signed char:

% gcc test.c && ./a.out Enter characters : sfdasadfdsaf sfdasadfdsaf ^D %

Seems to be working right. But with unsigned char:

% gcc test.c -funsigned-char && ./a.out                    Enter characters : Hello world Hello world ���������������������������^C %

That is, I tried to press Ctrl-D there many times but a � was printed for each EOF instead of breaking the loop.

Now, again, for the signed char case, it cannot distinguish between char 255 and EOF on Linux, breaking it for binary data and such:

% gcc test.c && echo -e 'Hello world\0377And some more' | ./a.out  Enter characters : Hello world %

Only the first part up to the \0377 escape was written to stdout.

Beware that comparisons between character constants and an int containing the unsigned character value might not work as expected (e.g. the character constant 'ä' in ISO 8859-1 would mean the signed value -28. So assuming that you write code that would read input until 'ä' in ISO 8859-1 codepage, you'd do

int c; while ((c = getchar()) != EOF){     if (c == (unsigned char)'ä') {         /* ... */     } }

Due to integer promotion, all char values fit into an int, and are automatically promoted on function calls, thus you can give any of int, char, signed char or unsigned char to putchar as an argument (not to store its return value), and it would work as expected.

The actual value passed in the integer might be positive or even negative; for example the character constant \377 would be negative on a 8-bit-char system where char is signed; however putchar (or fputc actually) will convert the value to an unsigned char. C11 7.21.7.3p2:

2 The fputc function writes the character specified by c (converted to an unsigned char) to the output stream pointed to by stream [...]

(emphasis mine)

I.e. the fputc will be guaranteed to convert the given c as if by (unsigned char)c

179

answered Sep 28 '22 05:09

Antti Haapala -- Слава Україні

Related questions
                            
                                How to find all divs who's class starts with a string in BeautifulSoup?
                            
                                xcode project build successfully but when archiving: bitcode bundle could
                            
                                Call java varargs method from kotlin
                            
                                suppressing sorting in dataTables in Shiny
                            
                                Why don't lambda expressions require <functional>, but function<void()> does?
                            
                                Linux Bash: Move multiple different files into same directory
                            
                                react router this.props.location
                            
                                Checking if permissions have been granted already by user in Android
                            
                                How to backup git stash content?
                            
                                Changing the formatting of a datetime axis in matplotlib
                            
                                Kotlin - equivalence to Swift's combination of "if let + cast"
                            
                                Auto reloading flask server on Docker

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With