Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can a file contain null bytes?

How is it possible that files can contain null bytes in operating systems written in a language with null-terminating strings (namely, C)?

For example, if I run this shell code:

$ printf "Hello\00, World!" > test.txt $ xxd test.txt 0000000: 4865 6c6c 6f00 2c20 576f 726c 6421       Hello., World! 

I see a null byte in test.txt (at least in OS X). If C uses null-terminating strings, and OS X is written in C, then how come the file isn't terminated at the null byte, resulting in the file containing Hello instead of Hello\00, World!? Is there a fundamental difference between files and strings?

like image 691
RK. Avatar asked Jan 05 '16 20:01

RK.


People also ask

What is null byte in file?

While null-bytes are used to terminate strings and needed for string manipulation functions (so they know where the string ends), in binary files \0 bytes can be everywhere.

How do I remove a null from a file?

Using the -d switch we delete a character. A backslash followed by three 0's represents the null character. This just deletes these characters and writes the result to a new file.

Can null be used as a file name?

It's impossible to create a file name containing a null byte through POSIX or Windows APIs.

Do files end with a null character?

Answer. In a Continuus database, the ascii type, and all types derived from it, are used for objects whose source attribute contains only the characters from the basic ascii set of 128 characters. These plain ascii files are invariably terminated with a null character (character number 0 in the ascii set).


1 Answers

Null-terminated strings are a C construct used to determine the end of a sequence of characters intended to be used as a string. String manipulation functions such as strcmp, strcpy, strchr, and others use this construct to perform their duties.

But you can still read and write binary data that contains null bytes within your program as well as to and from files. You just can't treat them as strings.

Here's an example of how this works:

#include <stdio.h> #include <stdlib.h>  int main() {     FILE *fp = fopen("out1","w");     if (fp == NULL) {         perror("fopen failed");         exit(1);     }      int a1[] = { 0x12345678, 0x33220011, 0x0, 0x445566 };     char a2[] =  { 0x22, 0x33, 0x0, 0x66 };     char a3[] = "Hello\x0World";      // this writes the whole array     fwrite(a1, sizeof(a1[0]), 4, fp);     // so does this     fwrite(a2, sizeof(a2[0]), 4, fp);     // this does not write the whole array -- only "Hello" is written     fprintf(fp, "%s\n", a3);     // but this does     fwrite(a3, sizeof(a3[0]), 12, fp);     fclose(fp);     return 0; } 

Contents of out1:

[dbush@db-centos tmp]$ xxd out1 0000000: 7856 3412 1100 2233 0000 0000 6655 4400  xV4..."3....fUD. 0000010: 2233 0066 4865 6c6c 6f0a 4865 6c6c 6f00  "3.fHello.Hello. 0000020: 576f 726c 6400                           World. 

For the first array, because we use the fwrite function and tell it to write 4 elements the size of an int, all the values in the array appear in the file. You can see from the output that all values are written, the values are 32-bit, and each value is written in little-endian byte order. We can also see that the second and fourth elements of the array each contain one null byte, while the third value being 0 has 4 null bytes, and all appear in the file.

We also use fwrite on the second array, which contains elements of type char, and we again see that all array elements appear in the file. In particular, the third value in the array is 0, which consists of a single null byte that also appears in the file.

The third array is first written with the fprintf function using a %s format specifier which expects a string. It writes the first 5 bytes of this array to the file before encountering the null byte, after which it stops reading the array. It then prints a newline character (0x0a) as per the format.

The third array it written to the file again, this time using fwrite. The string constant "Hello\x0World" contains 12 bytes: 5 for "Hello", one for the explicit null byte, 5 for "World", and one for the null byte that implicitly ends the string constant. Since fwrite is given the full size of the array (12), it writes all of those bytes. Indeed, looking at the file contents, we see each of those bytes.

As a side note, in each of the fwrite calls, I've hardcoded the size of the array for the third parameter instead of using a more dynamic expression such as sizeof(a1)/sizeof(a1[0]) to make it more clear exactly how many bytes are being written in each case.

like image 151
dbush Avatar answered Sep 28 '22 18:09

dbush