Why null-terminated strings? Or: null-terminated vs. characters + length storage

Tags:

I'm writing a language interpreter in C, and my string type contains a length attribute, like so:

struct String {     char* characters;     size_t length; };

Because of this, I have to spend a lot of time in my interpreter handling this kind of string manually since C doesn't include built-in support for it. I've considered switching to simple null-terminated strings just to comply with the underlying C, but there seem to be a lot of reasons not to:

Bounds-checking is built-in if you use "length" instead of looking for a null.

You have to traverse the entire string to find its length.

You have to do extra stuff to handle a null character in the middle of a null-terminated string.

Null-terminated strings deal poorly with Unicode.

Non-null-terminated strings can intern more, i.e. the characters for "Hello, world" and "Hello" can be stored in the same place, just with different lengths. This can't be done with null-terminated strings.

String slice (note: strings are immutable in my language). Obviously the second is slower (and more error-prone: think about adding error-checking of begin and end to both functions).

struct String slice(struct String in, size_t begin, size_t end) {     struct String out;     out.characters = in.characters + begin;     out.length = end - begin;      return out; }  char* slice(char* in, size_t begin, size_t end) {     char* out = malloc(end - begin + 1);      for(int i = 0; i < end - begin; i++)         out[i] = in[i + begin];      out[end - begin] = '\0';      return out; }

After all this, my thinking is no longer about whether I should use null-terminated strings: I'm thinking about why C uses them!

So my question is: are there any benefits to null-termination that I'm missing?

416

asked Aug 10 '09 05:08

Imagist

1 Answers

From Joel's Back to Basics:

Why do C strings work this way? It's because the PDP-7 microprocessor, on which UNIX and the C programming language were invented, had an ASCIZ string type. ASCIZ meant "ASCII with a Z (zero) at the end."

Is this the only way to store strings? No, in fact, it's one of the worst ways to store strings. For non-trivial programs, APIs, operating systems, class libraries, you should avoid ASCIZ strings like the plague.

149

answered Sep 28 '22 18:09

weiqure

Related questions
                            
                                Why does printf() promote a float to a double?
                            
                                Why can't I access a pointer to pointer for a stack array?
                            
                                Why is floor() so slow?
                            
                                Why can't I use strerror?
                            
                                Syntax highlighting in MS Word document [closed]
                            
                                Compiling C and C++ files together using GCC
                            
                                Pass struct by reference in C
                            
                                Examples of Union in C [closed]
                            
                                What's the equivalent of new/delete of C++ in C?
                            
                                How to make a PHP extension [duplicate]
                            
                                Why is this code involving arrays and pointers behaving as it does?
                            
                                How to Check if the function exists in C/C++
                            
                                In Linux, how do I get man pages for C functions rather than for bash commands?
                            
                                Is there a way to print out the type of a variable/pointer in C?
                            
                                Finding offset of a structure element in c
                            
                                Reading \r (carriage return) vs \n (newline) from console with getc?
                            
                                What is the function of this statement *(long*)0=0;?
                            
                                Can clang-format break my code?
                            
                                Does C have a standard ABI?
                            
                                Two semicolons inside a for-loop parentheses

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why null-terminated strings? Or: null-terminated vs. characters + length storage

Tags:

performance

c

string

algorithm

null-terminated

Imagist

People also ask

1 Answers

weiqure

Recent Activity

Donate For Us