Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

valid characters for lisp symbols

First of all, as I understand it variable identifiers are called symbols in common lisp.

I noted that while in languages like C variable identifiers can only be alphanumberics and underscores, Common Lisp allows many more characters to be used like "*" and (at least scheme does) "?"

So, what I want to know is: what exactly is the full set of characters that Common Lisp allows to have in a symbol (or variable identifier if I'm wrong)? is that the same for Scheme?

Also, is the set of characters different for function names?

I've been googling, looking in the CLHS, and in Practical Common Lisp, and for the life of me, something must be wrong because I can't seem to find the answer.

like image 358
Roy Lamperado Avatar asked Jul 27 '12 07:07

Roy Lamperado


Video Answer


2 Answers

A detailed answer is a bit tricky. There is the ANSI standard for Common Lisp. It defines the set of available characters. Basically you can use all those defined characters for symbols. See also Symbols as Tokens.

For example

|Polynom 2 * x ** 3 - 5 * x ** 2 + 10|

is a valid symbol. Note that the vertical bars mark the symbol and do not belong to the symbol name.

Then there are the existing implementations of Common Lisp and their support of various character sets and string types. So several support Unicode (or similar) and allow Unicode characters in symbol names.

LispWorks:

CL-USER 1 > (list 'δ 'ψ 'σ)
(δ ψ σ)
like image 63
Rainer Joswig Avatar answered Sep 19 '22 20:09

Rainer Joswig


[From a Schemer's perspective. Even though some concepts in Scheme and Common Lisp have the same name, it does not mean that the mean the same thing in the two languages.]

First note that symbols and identifiers are two different things.

Symbols can be thought of as strings which support fast equality comparision. Two symbols s and t are equal (more or less) if they are spelled the same way. The operation string=? needs to loop over the characters in the and see if they are all alike. This take time proportional to the length of the shortest string. Symbols on the other hand are automatically (ny the runtime system) put into a (typically) hash table. Therefore symbol=? boils down to a simple pointer comparison and is thus very fast. Symbols are often used in cases where one in C would use enumerations.

Symbols are values that can be present at runtime.

Identifiers are simply names of variables in a program.

Now if said program is to be represented as a Scheme value, one choice would be to use symbols to represent identifiers - but that does not mean symbols are identifiers (or vice versa). A better representation of identifiers (still in Scheme) is syntax objects which besides the name of the identifier also records the where the identifier was read (or constructed). Say you encounter an undefined variable and want to signal where in the program the undefined variable is, then is very convenient that the source location is part of the representation of the identifier.

Last but not least. What are the legal characters of an identifer? Here it is best to quote chapter and version from R6RS:

4.2.4 Identifiers

Most identifiers allowed by other programming languages are also acceptable to Scheme. In general, a sequence of letters, digits, and “extended alphabetic characters” is an identifier when it begins with a character that cannot begin a representation of a number object. In addition, +, -, and ... are identifiers, as is a sequence of letters, digits, and extended alphabetic characters that begins with the two-character sequence ->. Here are some examples of identifiers:

lambda         q                soup
list->vector   +                V17a
<=             a34kTMNs         ->-
the-word-recursion-has-many-meanings

Extended alphabetic characters may be used within identifiers as if they were letters. The following are extended alphabetic characters:

! $ % & * + - . / : < = > ? @ ^ _ ~ 

Moreover, all characters whose Unicode scalar values are greater than 127 and whose Unicode category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd, Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So, or Co can be used within identifiers. In addition, any character can be used within an identifier when specified via an <inline hex escape>. For example, the identifier H\x65;llo is the same as the identifier Hello, and the identifier \x3BB; is the same as the identifier λ.

Any identifier may be used as a variable or as a syntactic keyword (see sections 5.2 and 9.2) in a Scheme program. Any identifier may also be used as a syntactic datum, in which case it represents a symbol (see section 11.10).

From: http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_sec_4.2.4

like image 29
soegaard Avatar answered Sep 18 '22 20:09

soegaard