According to The Little SAS Book, SAS character data can be up to 2^(15)-1 in length.
Where does that 1 character go? Usually in floating point arithmetic, we reserve one byte for the sign of the floating point number. Is something similar happening for SAS character data?
I don't have a definite answer, but I have a supposition.
I think that the length of 32,767 is not related to the field itself; SAS stores all of its rows (in an uncompressed file) in identical sized blocks, and so there is no need for a field length indicator or a null terminator. IE, in a SAS dataset you would have something like, for the following data step equivalent:
data want;
length name $8;
input recnum name $ age;
datalines;
01 Johnny 13
02 Nancy 12
03 Rachel 14
04 Madison 12
05 Dennis 15
;;;;
run;
You'd have something like this. The headers are of course not written that way but are just packed sequences of bytes.
<dataset header>
Dataset name: Want
Dataset record size: 24 bytes
... etc. ...
<subheaders>
Name character type length=8
Recnum numeric type length=8
Age numeric type length=8
... etc. ...
<first row of data follows>
4A6F686E6E792020000000010000000D
4E616E6379202020000000020000000C
52616368656C2020000000030000000E
4D616469736F6E20000000040000000C
44656E6E69732020000000050000000F
<end of data>
The variables run directly into each other, and SAS knows where one starts and one stops from the information in the subheaders. (This is just a PUT statement of course; I think in the actual file the integers are stored first, if I remember correctly; but the idea is the same.)
Technically the .sas7bdat specification is not a publicly disclosed specification, but several people have worked out most of how the file format works. Some R programmers have written a specification which while a bit challenging to read does give some information.
It denotes that 4 bytes are used to specify the field length, which is more than enough for 32767 (it's enough for 2 billion), so this isn't the definite answer; I suppose it may have originally been 2 bytes and changed to 4 at some later point in the development of SAS, though .sas7bdat was a totally new filetype created relatively recently (version 7, hence sas7bdat; we're on 9 now).
Another possibility, and perhaps the more likely one, is that before 1999 the ANSI C standard only required C compilers to support objects to a minimum of 32767 bytes - meaning a compiler didn't have to support arrays larger than 32767 bytes. While many of them did support much larger arrays/objects, it's possible that SAS was working with the minimum standard to avoid issues with different OS and hardware implementations. See this discussion of the ANSI C standards for some background. It's also possible another language's limitations (as SAS uses several different ones) of a similar nature are at fault here. [Credit to FriedEgg for the beginning of this idea (offline).]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With