Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are Pascal Strings?

Are they named after the programming language, or the mathematician?

What are the defining characteristics of Pascal strings? In Wikipedia's article on strings it seems like the defining characteristic is storing the length of the string in the first byte. In another article I get the impression that the memory layout of the strings is also important.

While perusing an unrelated SO thread somebody mentioned that Pascal strings make Excel fast. What are the advantages of Pascal strings over null-terminated strings? Or more generally, in what situations do Pascal strings excel?

Are Pascal strings implemented in any other languages?

Last, do I capitalize both words ("Pascal Strings") or only the first ("Pascal strings")? I'm a technical writer...

like image 226
Kayce Basques Avatar asked Jul 31 '14 21:07

Kayce Basques


People also ask

What is character in Pascal?

A char stores a single character and is currently one byte, and AnsiChar is an alias for it. However, in the future, char may become the same as a WideChar. For now, byte and char are almost identical - one byte (8-bits) in size.

How do you check a string in Pascal?

You can use pos function. From documentation: The pos function returns the position of a substring in a main string. If the substring does not exist in the main string, then the returned value will be 0.


1 Answers

Pascal strings were made popular by one specific, but huge influential Pascal implementation, named UCSD. So UCSD Strings is a better term. This is the same implementation that made bytecode interpreters popular.

In general it is not one specific type, but the basic principle of having the size prefixed to the character data. This makes getting the length a constant time operation (O(1)) instead of scanning the character data for a nul character.

Not all Pascals used this concept. IIRC, the original (seventies) convention was to space pad an allocation, and scan backwards for a non space character (making it impossible for strings to have a terminating space). Moreover, since software was mostly used in isolation, all kinds of schemes were used, often based on what was advantageous for that implementation/architecture.

While the construct is not part of Standard Pascal, the most popular dialects from Borland (Turbo Pascal, Delphi and Free Pascal) generally base themselves on UCSD dialect, and thus have pascal strings, Delphi currently has 5 such strings. (short/ansi/wide/unicode/open)

On the other hand, this means that in a loop, you need some additional check based on indexes to check for the end of the string.

So instead by copying a string using

while (p^) do begin P^=p2^; inc(p) inc(p2); end; 

which is wholly equivalent to

while (*s++ = *t++); 

in C when using an optimizing compiler.

you need to do e.g.

while (len>0) do begin p^:=p2^; inc(p) inc(p2); dec(len); end; 

or even

i:=1; while (i<=len) do begin p[i]:=p2[i]; inc(i); end; 

This made the number of instructions in a Pascal string loop slightly larger than the equivalent zero terminated string, and adds one more live value. Additionally, UCSD was a bytecode (p-code) interpreter language, and the latter code based on pascal string use is "safe".

In case of an architecture that had built in post increment (++) operators (like the PDP-8,11's C was developed for originally), the pointer version was even cheaper, specially without optimization. Nowadays optimizing compilers could easily detect any of these constructs and convert them to whatever is best.

More importantly, since the early nineties security became more important, and in general solely relying on null terminated strings property is frowned upon because small errors in validation can cause potentially exploitable buffer overflow issues. C and the its standards therefore deprecated the old string use, and now use "-n-" versions of the older string routines (strNcpy etc) that need a maximal length to be passed. This is adds the same extra live value, similar to the length, like a manually managed Pascal strings principle, where the programmer must take care of passing the length (or maximum buffer size for C's -N- functions) around. Pascal strings still have the advantage of getting to the last occupied char in an O(1) operation, and the fact that there are no forbidden chars though.

Length prefixed strings are also used extensively in file format, because, obviously, it is useful to know the number of bytes to read up front.

like image 192
Marco van de Voort Avatar answered Nov 08 '22 20:11

Marco van de Voort