How is it possible that files can contain null bytes in operating systems written in a language with null-terminating strings (namely, C)?
For example, if I run this shell code:
$ printf "Hello\00, World!" > test.txt $ xxd test.txt 0000000: 4865 6c6c 6f00 2c20 576f 726c 6421 Hello., World!
I see a null byte in test.txt
(at least in OS X). If C uses null-terminating strings, and OS X is written in C, then how come the file isn't terminated at the null byte, resulting in the file containing Hello
instead of Hello\00, World!
? Is there a fundamental difference between files and strings?
While null-bytes are used to terminate strings and needed for string manipulation functions (so they know where the string ends), in binary files \0 bytes can be everywhere.
Using the -d switch we delete a character. A backslash followed by three 0's represents the null character. This just deletes these characters and writes the result to a new file.
It's impossible to create a file name containing a null byte through POSIX or Windows APIs.
Answer. In a Continuus database, the ascii type, and all types derived from it, are used for objects whose source attribute contains only the characters from the basic ascii set of 128 characters. These plain ascii files are invariably terminated with a null character (character number 0 in the ascii set).
Null-terminated strings are a C construct used to determine the end of a sequence of characters intended to be used as a string. String manipulation functions such as strcmp
, strcpy
, strchr
, and others use this construct to perform their duties.
But you can still read and write binary data that contains null bytes within your program as well as to and from files. You just can't treat them as strings.
Here's an example of how this works:
#include <stdio.h> #include <stdlib.h> int main() { FILE *fp = fopen("out1","w"); if (fp == NULL) { perror("fopen failed"); exit(1); } int a1[] = { 0x12345678, 0x33220011, 0x0, 0x445566 }; char a2[] = { 0x22, 0x33, 0x0, 0x66 }; char a3[] = "Hello\x0World"; // this writes the whole array fwrite(a1, sizeof(a1[0]), 4, fp); // so does this fwrite(a2, sizeof(a2[0]), 4, fp); // this does not write the whole array -- only "Hello" is written fprintf(fp, "%s\n", a3); // but this does fwrite(a3, sizeof(a3[0]), 12, fp); fclose(fp); return 0; }
Contents of out1:
[dbush@db-centos tmp]$ xxd out1 0000000: 7856 3412 1100 2233 0000 0000 6655 4400 xV4..."3....fUD. 0000010: 2233 0066 4865 6c6c 6f0a 4865 6c6c 6f00 "3.fHello.Hello. 0000020: 576f 726c 6400 World.
For the first array, because we use the fwrite
function and tell it to write 4 elements the size of an int
, all the values in the array appear in the file. You can see from the output that all values are written, the values are 32-bit, and each value is written in little-endian byte order. We can also see that the second and fourth elements of the array each contain one null byte, while the third value being 0 has 4 null bytes, and all appear in the file.
We also use fwrite
on the second array, which contains elements of type char
, and we again see that all array elements appear in the file. In particular, the third value in the array is 0, which consists of a single null byte that also appears in the file.
The third array is first written with the fprintf
function using a %s
format specifier which expects a string. It writes the first 5 bytes of this array to the file before encountering the null byte, after which it stops reading the array. It then prints a newline character (0x0a
) as per the format.
The third array it written to the file again, this time using fwrite
. The string constant "Hello\x0World"
contains 12 bytes: 5 for "Hello", one for the explicit null byte, 5 for "World", and one for the null byte that implicitly ends the string constant. Since fwrite
is given the full size of the array (12), it writes all of those bytes. Indeed, looking at the file contents, we see each of those bytes.
As a side note, in each of the fwrite
calls, I've hardcoded the size of the array for the third parameter instead of using a more dynamic expression such as sizeof(a1)/sizeof(a1[0])
to make it more clear exactly how many bytes are being written in each case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With