Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is “-” (hyphen) the unique ASCII limitation for E-mail compatibility?

I was reading this proposal for Base91, (with bold formatting added by me):

All the SMTP-based E-mail can provide compatibility with the E-mail. So-called compatibility with the E-mail is to transform arbitrary 8-bit data byte-strings or arbitrary bit stream data transferred by the E-mail into a character-strings of a limited ASCII. The main limitation on the latter is that:
(a) the characters have to be printable;
(b) the characters are not control character or “-” (hyphen).
There are totally 94 of such ASCII characters, their corresponding digital coding being all integers ranging from 32 through 126 with the exception of 45. E-mail written in these ASCII characters is compatible with the Internet standard SMTP, and can be transferred in nearly all the E-mail systems.

Note: 45 is the ASCII value for hyphen.
Note: I just figured out that this proposal originates from patents in China (ZL00112884.1) and US (US6859151B2).

But I also read the RFC 5321 regarding SMTP, and I couldn't find anything that makes the hyphen character the exclusive limitation from the printable ASCII range.

Note: The printable ASCII range is:
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Why is the Base91 proposal/patent claiming that “-” (hyphen) is the only limitation for E-mail compatibility?

like image 256
Cœur Avatar asked Jun 04 '18 06:06

Cœur


People also ask

What is the limitation of ASCII?

Limitation of ASCII The 128 or 256 character limits of ASCII and Extended ASCII limits the number of character sets that can be held. Representing the character sets for several different language structures is not possible in ASCII, there are just not enough available characters.

Why does ASCII go to 127?

There are no Ascii values greater than 127 (decimal), and Ascii value 127 stands for an invisible control character. Ascii defines a character code that assigns meanings to code numbers from 0 and 127, and that's it.

What is ASCII value of special characters?

Special Characters(32–47 / 58–64 / 91–96 / 123–126): Special characters include all printable characters that are neither letters nor numbers. These include punctuation or technical, mathematical characters.

What is the purpose of ASCII code?

ASCII, in full American Standard Code for Information Interchange, a standard data-encoding format for electronic communication between computers. ASCII assigns standard numeric values to letters, numerals, punctuation marks, and other characters used in computers.

What is the main limitation of ASCII code?

Originally Answered: What is the main limitation of ASCII? The main limitation of ASCII is that it only has 94 printable characters — that’s enough for the 26 letters of the basic Latin alphabet in upper and lower case, the 10 digits 0 to 9, and some common punctuation.

What is extended ASCII?

Extended ASCII was an early hack to add both graphical characters and added characters for non-English Roman languages. Instead, both wide characters (2-byte encoding) developed largely by Microsoft, and Unicode developed by many vendors Originally Answered: What is the main limitation of ASCII?

Can email addresses include hyphen characters?

The short answer is yes, email addresses can include these characters, but with some exceptions. The two biggest factors to consider are hyphen placement and email service provider.

Is there a special character other than the hyphen in URLs?

It is common for websites to use alphanumeric character and only one special character, the hyphen, to separate words. It is unlikely that anyone would expect to see or type a special character other than the hyphen into a URL.


1 Answers

Looks like the hyphen is used as a control/marker character in multiline SMTP messages.

RFC5321 4.2.1 Reply Code Severities and Theory:

The format for multiline replies requires that every line, except the last, begin with the reply code, followed immediately by a hyphen, "-" (also known as minus), followed by text. The last line will begin with the reply code, followed immediately by <SP>, optionally some text, and <CRLF>. As noted above, servers SHOULD send the <SP> if subsequent text is not sent, but clients MUST be prepared for it to be omitted.

The Base91 proposal uses SMTP as an example of both it's application and limitations. As you state, it originally wanted to use 94 characters, but due to various standards (e.g. SMTP), it excludes commonly used pseudo-control characters ("-", ".", "="). It uses SMTP because it demonstrates the practicality of a Base91 encoding (e.g. encoding 13 bits of data per character rather than 6 bits with Base64 can greatly reduce the amount of bits required to encode any given message) in addition to acknowledging that its usage of hyphens as a control character won't cause ambiguity in Base91 text.

Any text can be encoded by Base91--The paper states that it maps 13 bits of data into two printable ASCII characters. Any number, any character (incl new line characters) can be encoded by Base91, similarly to how any character can be encoded by Base64. Likewise, this mapping can be reversed, to produce the original output from a Base91 encoding.

Here's a multiline SMTP reply code example:

  250-First line
  250-Second line
  250-234 Text beginning with numbers
  250 The last line

In this example, it converts a large multiline SMTP message that contains both hyphens, newlines, and numbers into some Base91-encoded form. If this encoded form contained pseudo-control characters, such as the hyphen, SMTP clients may interpret Base91-encoded data to be malformed SMTP data. The purpose of removing characters such as hyphens from the Base91 character set is not because of flaws with SMTP or the specifications of SMTP itself, it's with clients that use and parse SMTP data, and ensuring that clients can still properly accept Base91 data without any risk of misparsing it as SMTP data.

like image 181
xes_p Avatar answered Oct 10 '22 01:10

xes_p