Unicode version of ABNF?

Tags:

I want to write a grammar for a file format whose content can contain characters other than US-ASCII ones. Since I am used to ABNF, I try to use it...

However, none of RFCs 5234 and 7405 are very friendly towards people who DO NOT use US ASCII.

In fact, I'm looking for an ABNF version (and possibly some basic rules as well) which is character oriented rather than byte oriented; the only thing which RFC 5234 has to say about this is in section 2.4:

2.4.  External Encodings

   External representations of terminal value characters will vary
   according to constraints in the storage or transmission environment.
   Hence, the same ABNF-based grammar may have multiple external
   encodings, such as one for a 7-bit US-ASCII environment, another for
   a binary octet environment, and still a different one when 16-bit
   Unicode is used.  Encoding details are beyond the scope of ABNF,
   although Appendix B provides definitions for a 7-bit US-ASCII
   environment as has been common to much of the Internet.

   By separating external encoding from the syntax, it is intended that
   alternate encoding environments can be used for the same syntax.

That doesn't really clarify matters.

Is there a version of ABNF somewhere which is code point oriented rather than byte oriented?

998

asked Mar 11 '15 07:03

fge

1 Answers

Refer to section 2.3 of RFC 5234, which says:

Rules resolve into a string of terminal values, sometimes called characters. In ABNF, a character is merely a non-negative integer. In certain contexts, a specific mapping (encoding) of values into a character set (such as ASCII) will be specified.

Unicode is just the set of non-negative integers U+0000 through U+10FFFF minus the surrogate range D800-DFFF and there are various RFCs that use ABNF accordingly. An example is RFC 3987.

117

answered Oct 22 '22 21:10

Björn Höhrmann

Related questions
                            
                                Python sys.maxint, sys.maxunicode on Linux and windows
                            
                                Getting encoding error when using hash keys to write xml files with XML::LibXML
                            
                                Locale specific index characters
                            
                                How to read a file with long file name with unicode in Strawberry perl not using Win32::Unicode::File?
                            
                                Python, unicodedata name, and codepoint value, what am i missing?
                            
                                Displaying iOS emoji unicode characters on the web
                            
                                Scrapy spider: dealing with pages that have incorrectly-defined character encoding
                            
                                Python: extract Cyrillic string from EXIF
                            
                                Is there known URI scheme or URN namespace for Unicode characters?
                            
                                Why is mb_convert_case in PHP 5.4 breaking my string, when in 5.2 it doesn't?
                            
                                memory location in unicode strings
                            
                                gson serialization of unicode string not working
                            
                                unicode conversion and export in R
                            
                                PHP Unicode in JSON
                            
                                Is there a function to decode encoded unicode utf-8 string like from a form?
                            
                                Default Encoding and changes
                            
                                How to keep BOM from removal from Perforce unicode files
                            
                                Why does QStringLiteral returns a garbled string
                            
                                Ruby trying to dynamically create unicode string throws "invalid Unicode escape" error
                            
                                Android - decode unicode characters without StringEscapeUtils?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Unicode version of ABNF?

Tags:

unicode

grammar

abnf

fge

People also ask

1 Answers

Björn Höhrmann

Recent Activity

Donate For Us