Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does '\0' appear naturally in text files?

Tags:

c

arduino

I encountered a somewhat annoying bug today where a string (stored as a char[]) would be printed with junk at the end. The string that was suppose to be printed (using arduino print/write functions) was correct (it correctly included \r and \n). However, there would be junk printed at the end.

I then allocated an extra element to store a '\0' after '\r' and '\n' (which were the last 2 characters in the string to be printed). Then, print() printed the string correctly. It seems '\0' was used to indicate to the print() function that the string had terminated (I remember reading this in Kernighan's C).

This bug appeared in my code which reads from a text file. It occurred to me that I did not encounter '\0' at all when I designed my code. This leads me to believe that '\0' has no practical use in text editors and are merely used by print functions. Is this correct?

like image 211
Minh Tran Avatar asked Jun 14 '15 02:06

Minh Tran


People also ask

What does the 0 character indicates in a string?

'\0' is referred to as NULL character or NULL terminator It is the character equivalent of integer 0(zero) as it refers to nothing In C language it is generally used to mark an end of a string.

Does string end with 0?

In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character (a character with a value of zero, called NUL in this article).

Does fprintf add NULL terminator?

The NUL terminator is only for fprintf (or whatever string function you are using) to know when to stop writing characters from the pointer; no NUL is actually ever written to the file.


1 Answers

C strings are terminated by the NUL byte ('\0') - this is implicitly appended to any string literals in double quotes, and used as the terminator by all standard library functions operating on strings. From this it follows that C strings can not contain the '\0' terminator in between other characters, since there would be no way to tell whether it is the actual end of string or not.

(Of course you could handle strings in the C language other than as C strings - e.g., simply adding an integer to record the length of the string would make the terminator unnecessary, but such strings would not be fully interoperable with functions expecting C strings.)

A "text file" in general is not governed by the C standard, and a user of a C program could conceivably give a file containing a NUL byte as input to a C program (which would be unable to handle it "correctly" for the above reasons if it read the file into C strings). However, the NUL byte has no valid reason for existing in a plain text file, and it may be considered at least a de facto standard for text files that they do not contain the NUL byte (or certain other control characters, which might break transmission of that text through some terminals or serial protocols).

I would argue that it is an acceptable (though not necessary!) limitation for a program working on plain text input to not guarantee correct output if there are NUL bytes in the input. However, the programmer should be aware of this possibility regardless of whether it will be treated correctly, and not allow it to cause undefined behaviour in their program. Like all user input, it should be considered "unsafe" in the sense that it can contain anything (e.g., it could be maliciously formed on purpose).

like image 120
Arkku Avatar answered Nov 14 '22 05:11

Arkku