Why do Java octal escapes only go up to 255?

Tags:

The Java language specification states that the escapes inside strings are the "normal" C ones like \n and \t, but they also specify octal escapes from \0 to \377. Specifically, the JLS states:

OctalEscape:
    \ OctalDigit
    \ OctalDigit OctalDigit
    \ ZeroToThree OctalDigit OctalDigit

OctalDigit: one of
    0 1 2 3 4 5 6 7

ZeroToThree: one of
    0 1 2 3

meaning that something like \4715 is illegal, despite it being within the range of a Java character (since Java characters are not bytes).

Why does Java have this arbitrary restriction? How are you meant to specify octal codes for characters beyond 255?

687

asked Mar 03 '12 03:03

paxdiablo

2 Answers

It is probably for purely historical reasons that Java supports octal escape sequences at all. These escape sequences originated¹ in C, in the days when computers like the PDP-7 ruled the Earth, and much programming was done in assembly or directly in machine code, and octal was the preferred number base for writing instruction codes, and there was no Unicode, just ASCII, so three octal digits were sufficient to represent the entire character set.

By the time Unicode and Java came along, octal had pretty much given way to hexadecimal as the preferred number base when decimal just wouldn't do. So Java has its \u escape sequence that takes hexadecimal digits. The octal escape sequence was probably supported just to make C programmers comfortable, and to make it easy to copy'n'paste string constants from C programs into Java programs.

Check out these links for historical trivia:

http://en.wikipedia.org/wiki/Octal#In_computers
http://en.wikipedia.org/wiki/PDP-11_architecture#Memory_management

C's immediate predecessors, BCPL and B, used * instead of \ to introduce string escape sequences. However, neither of those languages had octal escape sequences documented in the manuals linked.

200

answered Oct 10 '22 10:10

rob mayoff

If I can understand the rules (please correct me if I am wrong):

\ OctalDigit
Examples:
    \0, \1, \2, \3, \4, \5, \6, \7

\ OctalDigit OctalDigit
Examples:
    \00, \07, \17, \27, \37, \47, \57, \67, \77

\ ZeroToThree OctalDigit OctalDigit
Examples:
    \000, \177, \277, \367,\377

\t, \n, \\ do not fall under OctalEscape rules; they must be under separate escape character rules.

Decimal 255 is equal to Octal 377 (use Windows Calculator in scientific mode to confirm)

Hence a three-digit Octal value falls in the range of \000 (0) to \377 (255)

Therefore, \4715 is not a valid octal value as it is more than three-octal-digits rule. If you want to access the code point character with decimal value 4715, use Unicode escape symbol \u to represent the UTF-16 character \u126B (4715 in decimal form) since every Java char is in Unicode UTF-16.

from http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Character.html:

The char data type (and therefore the value that a Character object encapsulates) are based on the original Unicode specification, which defined characters as fixed-width 16-bit entities. The Unicode standard has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, known as Unicode scalar value. (Refer to the definition of the U+n notation in the Unicode standard.)

The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane (BMP). Characters whose code points are greater than U+FFFF are called supplementary characters. The Java 2 platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes. In this representation, supplementary characters are represented as a pair of char values, the first from the high-surrogates range, (\uD800-\uDBFF), the second from the low-surrogates range (\uDC00-\uDFFF).

Edited:

Anything that beyond the valid octal value of 8-bit range (larger than one byte) is language-specific. Some programming languages may carry on to match Unicode implementation; some may not (limit it to one byte). Java definitely does not allow it even though it has Unicode support.

A few programming languages (vendor-dependent) that limit to one-byte octal literals:

Java (all vendors): - An octal integer constant that begins with 0 or single-digit in base-8 (up to 0377); \0 to \7, \00 to \77, \000 to \377 (in octal string literal format)
C/C++ (Microsoft) - An octal integer constant that begins with 0 (up to 0377); octal string literal format \nnn
Ruby - An octal integer constant that begins with 0 (up to 0377); octal string literal format \nnn

A few programming languages (vendor-dependent) that support larger-than-one-byte octal literals:

Perl - An octal integer constant that begins with 0; octal string literal format \nnn See http://search.cpan.org/~jesse/perl-5.12.1/pod/perlrebackslash.pod#Octal_escapes

A few programming languages do not support octal literals:

C# - use Convert.ToInt32(integer, 8) for base-8 How can we convert binary number into its octal number using c#?

answered Oct 10 '22 08:10

ecle

Related questions
                            
                                How to add and ignore a field for json response
                            
                                Validate credit card details
                            
                                generic return object
                            
                                URL.setURLStreamHandlerFactory
                            
                                Is it possible to find out whether two instances are of the same RDF class, programmatically?
                            
                                Sms ContentObserver onChange() fires multiple times
                            
                                How to drop insignificant zeros when converting a BigDecimal to a String in Java?
                            
                                Does the actual lock matter when deciding to use volatile?
                            
                                How to compile java project with external jar file in Linux terminal
                            
                                Is there a way to tell which GC algorithm the JVM is currently using
                            
                                How do listen EditText?
                            
                                Objectify Relationships: One-to-Many, Can I do this efficiently?
                            
                                making EditText to show only two decimal places
                            
                                Idiomatic Scala List Comprehension - first item that matches
                            
                                Why is MessageDigest returning different answers for the same string?
                            
                                Caching method results in immutable objects
                            
                                Spring: how to instantiate a Spring bean that takes a runtime parameter?
                            
                                Key in TreeMap returning null
                            
                                How to disable javadoc spell checker in NetBeans
                            
                                Java Generics : Obtaining a Class<Collection<T>>?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why do Java octal escapes only go up to 255?

Tags:

java

escaping

octal

paxdiablo

People also ask

2 Answers

rob mayoff

ecle

Recent Activity

Donate For Us