Does '\0' appear naturally in text files?

Tags:

I encountered a somewhat annoying bug today where a string (stored as a char[]) would be printed with junk at the end. The string that was suppose to be printed (using arduino print/write functions) was correct (it correctly included \r and \n). However, there would be junk printed at the end.

I then allocated an extra element to store a '\0' after '\r' and '\n' (which were the last 2 characters in the string to be printed). Then, print() printed the string correctly. It seems '\0' was used to indicate to the print() function that the string had terminated (I remember reading this in Kernighan's C).

This bug appeared in my code which reads from a text file. It occurred to me that I did not encounter '\0' at all when I designed my code. This leads me to believe that '\0' has no practical use in text editors and are merely used by print functions. Is this correct?

211

asked Jun 14 '15 02:06

Minh Tran

1 Answers

C strings are terminated by the NUL byte ('\0') - this is implicitly appended to any string literals in double quotes, and used as the terminator by all standard library functions operating on strings. From this it follows that C strings can not contain the '\0' terminator in between other characters, since there would be no way to tell whether it is the actual end of string or not.

(Of course you could handle strings in the C language other than as C strings - e.g., simply adding an integer to record the length of the string would make the terminator unnecessary, but such strings would not be fully interoperable with functions expecting C strings.)

A "text file" in general is not governed by the C standard, and a user of a C program could conceivably give a file containing a NUL byte as input to a C program (which would be unable to handle it "correctly" for the above reasons if it read the file into C strings). However, the NUL byte has no valid reason for existing in a plain text file, and it may be considered at least a de facto standard for text files that they do not contain the NUL byte (or certain other control characters, which might break transmission of that text through some terminals or serial protocols).

I would argue that it is an acceptable (though not necessary!) limitation for a program working on plain text input to not guarantee correct output if there are NUL bytes in the input. However, the programmer should be aware of this possibility regardless of whether it will be treated correctly, and not allow it to cause undefined behaviour in their program. Like all user input, it should be considered "unsafe" in the sense that it can contain anything (e.g., it could be maliciously formed on purpose).

120

answered Nov 14 '22 05:11

Arkku

Related questions
                            
                                why InterlockedAdd is not available in vs2010?
                            
                                1D array decays to pointer, but 2D array doesn't do so, why? [duplicate]
                            
                                does chroot() require root privileges?
                            
                                Why does MapViewOfFile fail with ERROR_ACCESS_DENIED?
                            
                                Multiple color object detection using OpenCV
                            
                                C Global and Static variable storing in memory
                            
                                Infix to postfix algorithm that takes care of unary operators
                            
                                Is null character included while allocating using malloc
                            
                                Inconsistency in using pointer to an array and address of an array directly
                            
                                keep getting implicit declaration error
                            
                                Eliminate branching when find median in a binary {0, 255} image
                            
                                difference between time() and gettimeofday() and why does one cause seg fault
                            
                                "Expected expression before ' { ' token"
                            
                                How to convert int to string with Pebble SDK in C
                            
                                gcc shared library failed linking to glibc
                            
                                How do pointers work "under the hood" in C?
                            
                                scanf field width string overflow
                            
                                Wrong gcc generated assembly ordering, results in performance hit
                            
                                How to efficiently store a triangular matrix in memory?
                            
                                temporary file location when using tmpfile() in C

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does '\0' appear naturally in text files?

Tags:

c

arduino

Minh Tran

People also ask

1 Answers

Arkku

Recent Activity

Donate For Us