Are they named after the programming language, or the mathematician? What are the defining characteristics of Pascal strings? In Wikipedia's article on strings it seems like the defining characteristic is storing the length of the string in the first byte. In another article I get the impression that the memory layout of the strings is also important. While perusing an unrelated SO thread somebody mentioned that Pascal strings make Excel fast. What are the advantages of Pascal strings over null-terminated strings? Or more generally, in what situations do Pascal strings excel? Are Pascal strings implemented in any other languages? Last, do I capitalize both words ("Pascal Strings") or only the first ("Pascal strings")? I'm a technical writer...

Pascal strings were made popular by one specific, but huge influential Pascal implementation, named UCSD. So UCSD Strings is a better term. This is the same implementation that made bytecode interpreters popular. In general it is not one specific type, but the basic principle of having the size prefixed to the character data. This makes getting the length a constant time operation (O(1)) instead of scanning the character data for a nul character. Not all Pascals used this concept. IIRC, the original (seventies) convention was to space pad an allocation, and scan backwards for a non space character (making it impossible for strings to have a terminating space). Moreover, since software was mostly used in isolation, all kinds of schemes were used, often based on what was advantageous for that implementation/architecture. While the construct is not part of Standard Pascal, the most popular dialects from Borland (Turbo Pascal, Delphi and Free Pascal) generally base themselves on UCSD dialect, and thus have pascal strings, Delphi currently has 5 such strings. (short/ansi/wide/unicode/open) On the other hand, this means that in a loop, you need some additional check based on indexes to check for the end of the string. So instead by copying a string using <pre class="prettyprint"><code>while (p^) do begin P^=p2^; inc(p) inc(p2); end; </code></pre> which is wholly equivalent to <pre class="prettyprint"><code>while (*s++ = *t++); </code></pre> in C when using an optimizing compiler. you need to do e.g. <pre class="prettyprint"><code>while (len>0) do begin p^:=p2^; inc(p) inc(p2); dec(len); end; </code></pre> or even <pre class="prettyprint"><code>i:=1; while (i<=len) do begin p[i]:=p2[i]; inc(i); end; </code></pre> This made the number of instructions in a Pascal string loop slightly larger than the equivalent zero terminated string, and adds one more live value. Additionally, UCSD was a bytecode (p-code) interpreter language, and the latter code based on pascal string use is "safe". In case of an architecture that had built in post increment (++) operators (like the PDP-8,11's C was developed for originally), the pointer version was even cheaper, specially without optimization. Nowadays optimizing compilers could easily detect any of these constructs and convert them to whatever is best. More importantly, since the early nineties security became more important, and in general solely relying on null terminated strings property is frowned upon because small errors in validation can cause potentially exploitable buffer overflow issues. C and the its standards therefore deprecated the old string use, and now use "-n-" versions of the older string routines (strNcpy etc) that need a maximal length to be passed. This is adds the same extra live value, similar to the length, like a manually managed Pascal strings principle, where the programmer must take care of passing the length (or maximum buffer size for C's -N- functions) around. Pascal strings still have the advantage of getting to the last occupied char in an O(1) operation, and the fact that there are no forbidden chars though. Length prefixed strings are also used extensively in file format, because, obviously, it is useful to know the number of bytes to read up front.

What are Pascal Strings?

Tags:

string

data-structures

pascal

Are they named after the programming language, or the mathematician?

What are the defining characteristics of Pascal strings? In Wikipedia's article on strings it seems like the defining characteristic is storing the length of the string in the first byte. In another article I get the impression that the memory layout of the strings is also important.

While perusing an unrelated SO thread somebody mentioned that Pascal strings make Excel fast. What are the advantages of Pascal strings over null-terminated strings? Or more generally, in what situations do Pascal strings excel?

Are Pascal strings implemented in any other languages?

Last, do I capitalize both words ("Pascal Strings") or only the first ("Pascal strings")? I'm a technical writer...

226

asked Jul 31 '14 21:07

Kayce Basques

1 Answers

Pascal strings were made popular by one specific, but huge influential Pascal implementation, named UCSD. So UCSD Strings is a better term. This is the same implementation that made bytecode interpreters popular.

In general it is not one specific type, but the basic principle of having the size prefixed to the character data. This makes getting the length a constant time operation (O(1)) instead of scanning the character data for a nul character.

Not all Pascals used this concept. IIRC, the original (seventies) convention was to space pad an allocation, and scan backwards for a non space character (making it impossible for strings to have a terminating space). Moreover, since software was mostly used in isolation, all kinds of schemes were used, often based on what was advantageous for that implementation/architecture.

While the construct is not part of Standard Pascal, the most popular dialects from Borland (Turbo Pascal, Delphi and Free Pascal) generally base themselves on UCSD dialect, and thus have pascal strings, Delphi currently has 5 such strings. (short/ansi/wide/unicode/open)

On the other hand, this means that in a loop, you need some additional check based on indexes to check for the end of the string.

So instead by copying a string using

while (p^) do begin P^=p2^; inc(p) inc(p2); end;

which is wholly equivalent to

while (*s++ = *t++);

in C when using an optimizing compiler.

you need to do e.g.

while (len>0) do begin p^:=p2^; inc(p) inc(p2); dec(len); end;

or even

i:=1; while (i<=len) do begin p[i]:=p2[i]; inc(i); end;

This made the number of instructions in a Pascal string loop slightly larger than the equivalent zero terminated string, and adds one more live value. Additionally, UCSD was a bytecode (p-code) interpreter language, and the latter code based on pascal string use is "safe".

In case of an architecture that had built in post increment (++) operators (like the PDP-8,11's C was developed for originally), the pointer version was even cheaper, specially without optimization. Nowadays optimizing compilers could easily detect any of these constructs and convert them to whatever is best.

More importantly, since the early nineties security became more important, and in general solely relying on null terminated strings property is frowned upon because small errors in validation can cause potentially exploitable buffer overflow issues. C and the its standards therefore deprecated the old string use, and now use "-n-" versions of the older string routines (strNcpy etc) that need a maximal length to be passed. This is adds the same extra live value, similar to the length, like a manually managed Pascal strings principle, where the programmer must take care of passing the length (or maximum buffer size for C's -N- functions) around. Pascal strings still have the advantage of getting to the last occupied char in an O(1) operation, and the fact that there are no forbidden chars though.

Length prefixed strings are also used extensively in file format, because, obviously, it is useful to know the number of bytes to read up front.

192

answered Nov 08 '22 20:11

Marco van de Voort

Related questions
                            
                                String convert to Int and replace comma to Plus sign
                            
                                Performance of variable expansion vs. sprintf in PHP
                            
                                Does the MySQL TRIM function not trim line breaks or carriage returns?
                            
                                How to get email address from a long string
                            
                                How to declare a string in Objective-C?
                            
                                Java: removing numeric values from string
                            
                                C++ string to enum
                            
                                Quotation marks inside a string [duplicate]
                            
                                Can I add new methods to the String class in Java?
                            
                                C# string does not contain possible?
                            
                                Passing a structured numpy array with strings to a cython function
                            
                                Are the strings in argv modifiable?
                            
                                Check if variable has the value ''
                            
                                Correct way to trim a string in Java
                            
                                Google Coding Challenge Question 2020 : Unspecified Words
                            
                                Converting integer to string in Julia
                            
                                The maximum string content length quota (8192) has been exceeded while reading XML data
                            
                                CSV reader behavior with None and empty string
                            
                                Why is an empty string literal treated as true?
                            
                                Iterating over a list of strings in MATLAB

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With