Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Defining a string with no null terminating char(\0) at the end

Tags:

c++

c

What are various ways in C/C++ to define a string with no null terminating char(\0) at the end?

EDIT: I am interested in character arrays only and not in STL string.

like image 381
Ravi Gupta Avatar asked Sep 30 '10 06:09

Ravi Gupta


People also ask

How do you create a non null-terminated string?

A string in C is simply an array of characters, with the final character set to the NUL character (ascii/unicode point 0). This null-terminator is required; a string is ill-formed if it isn't there. The string literal token in C/C++ ("string") guarantees this. const char *str = {'f', 'o', 'o', 0};

Why do we need terminating null (\ 0 character in a string?

A "string" is really just an array of char s; a null-terminated string is one where a null character '\0' marks the end of the string (not necessarily the end of the array). All strings in code (delimited by double quotes "" ) are automatically null-terminated by the compiler.

What happens if a string does not have a null at the end?

Nothing stops you from creating an array of characters and not ending it with a null-terminator, but using it as a null-terminated byte string will lead to undefined behavior.

How do you end a string with null?

The null terminated strings are basically a sequence of characters, and the last element is one null character (denoted by '\0'). When we write some string using double quotes (“…”), then it is converted into null terminated strings by the compiler.


3 Answers

Typically as another poster wrote:

char s[6] = {'s', 't', 'r', 'i', 'n', 'g'};

or if your current C charset is ASCII, which is usually true (not much EBCDIC around today)

char s[6] = {115, 116, 114, 105, 110, 107};

There is also a largely ignored way that works only in C (not C++)

char s[6] = "string";

If the array size is too small to hold the final 0 (but large enough to hold all the other characters of the constant string), the final zero won't be copied, but it's still valid C (but invalid C++).

Obviously you can also do it at run time:

char s[6];
s[0] = 's';
s[1] = 't';
s[2] = 'r';
s[3] = 'i';
s[4] = 'n';
s[5] = 'g';

or (same remark on ASCII charset as above)

char s[6];
s[0] = 115;
s[1] = 116;
s[2] = 114;
s[3] = 105;
s[4] = 110;
s[5] = 103;

Or using memcopy (or memmove, or bcopy but in this case there is no benefit to do that).

memcpy(c, "string", 6);

or strncpy

strncpy(c, "string", 6);

What should be understood is that there is no such thing as a string in C (in C++ there is strings objects, but that's completely another story). So called strings are just char arrays. And even the name char is misleading, it is no char but just a kind of numerical type. We could probably have called it byte instead, but in the old times there was strange hardware around using 9 bits registers or such and byte implies 8 bits.

As char will very often be used to store a character code, C designers thought of a simpler way than store a number in a char. You could put a letter between simple quotes and the compiler would understand it must store this character code in the char.

What I mean is (for example) that you don't have to do

char c = '\0';

To store a code 0 in a char, just do:

char c = 0;

As we very often have to work with a bunch of chars of variable length, C designers also choosed a convention for "strings". Just put a code 0 where the text should end. By the way there is a name for this kind of string representation "zero terminated string" and if you see the two letters sz at the beginning of a variable name it usually means that it's content is a zero terminated string.

"C sz strings" is not a type at all, just an array of chars as normal as, say, an array of int, but string manipulation functions (strcmp, strcpy, strcat, printf, and many many others) understand and use the 0 ending convention. That also means that if you have a char array that is not zero terminated, you shouldn't call any of these functions as it will likely do something wrong (or you must be extra carefull and use functions with a n letter in their name like strncpy).

The biggest problem with this convention is that there is many cases where it's inefficient. One typical exemple: you want to put something at the end of a 0 terminated string. If you had kept the size you could just jump at the end of string, with sz convention, you have to check it char by char. Other kind of problems occur when dealing with encoded unicode or such. But at the time C was created this convention was very simple and did perfectly the job.

Nowadays, the letters between double quotes like "string" are not plain char arrays as in the past, but const char *. That means that what the pointer points to is a constant that should not be modified (if you want to modify it you must first copy it), and that is a good thing because it helps to detect many programming errors at compile time.

like image 107
kriss Avatar answered Sep 28 '22 12:09

kriss


The terminating null is there to terminate the string. Without it, you need some other method to determine it's length.

You can use a predefined length:

char s[6] = {'s','t','r','i','n','g'};

You can emulate pascal-style strings:

unsigned char s[7] = {6, 's','t','r','i','n','g'};

You can use std::string (in C++). (since you're not interested in std::string).

Preferably you would use some pre-existing technology that handles unicode, or at least understands string encoding (i.e., wchar.h).

And a comment: If you're putting this in a program intended to run on an actual computer, you might consider typedef-ing your own "string". This will encourage your compiler to barf if you ever accidentally try to pass it to a function expecting a C-style string.

typedef struct {
    char[10] characters;
} ThisIsNotACString;
like image 6
Seth Avatar answered Sep 26 '22 12:09

Seth


C++ std::strings are not NUL terminated.

P.S : NULL is a macro1. NUL is \0. Don't mix them up.

1: C.2.2.3 Macro NULL

The macro NULL, defined in any of <clocale>, <cstddef>, <cstdio>, <cstdlib>, <cstring>, <ctime>, or <cwchar>, is an implementation-defined C++ null pointer constant in this International Standard (18.1).

like image 5
Prasoon Saurav Avatar answered Sep 27 '22 12:09

Prasoon Saurav