Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is unicode character 2028 (LS / Line Separator) used for?

I was thinking to myself that the line breaking problem must be somewhat solved by someone, but maybe not widely adopted. Being forward thinking, I went to search to see if there was a platform independent unicode method to separate lines. In my search I found unicode character 2028. Then, I found Jeff Atwoods post on this topic where he mentions that he's "...not sure under what circumstances you would want those Unicode newline markers."

Well, me too. I did a little digging in the C# source code and it looks like LS (x2028) is not supported by TextReader.ReadLine() and it is also not supported in Java's BufferedReader.ReadLine(). So, my conclusion is that it is not widely supported.

I would love to have a bright future where I can write files using a single format in Linux, MacOS and Windows. Does this little character have promise? What is it currently used for?

like image 789
Elijah Avatar asked Jun 18 '10 18:06

Elijah


People also ask

What is a line separator character?

The line separator used by the in-memory representation of file contents is always the newline character. When a file is being loaded, the line separator used in the file on disk is stored in a per-buffer property, and all line-endings are converted to newline characters for the in-memory representation.

What is a Unicode separator?

Connectors : Source and Target Connectors : Source and Target Connectors T-Z : Unicode (Delimited) Unicode (Delimited) Unicode is a character set that uses 16 bits (two bytes) for each character and is able to include more characters than ASCII.

What is Unicode for New Line?

LF (character : \n, Unicode : U+000A, ASCII : 10, hex : 0x0a): This is simply the '\n' character which we all know from our early programming days. This character is commonly known as the 'Line Feed' or 'Newline Character'.

What is u2028?

\r is a Macintosh (pre-OSX) line ending. \u2028 is LINE SEPARATOR. \u2029 is PARAGRAPH SEPARATOR.


1 Answers

Nicked from McDowell’s comment on the same page, and indirectly from the Unicode docs:

Traditionally, NLF started out as a line separator (and sometimes record separator). It is still used as a line separator in simple text editors such as program editors. As platforms and programs started to handle word processing with automatic line-wrap, these characters were reinterpreted to stand for paragraph separators. For example, even such simple programs as the Windows Notepad program and the Mac SimpleText program interpret their platform’s NLF as a paragraph separator, not a line separator.

NLF (New Line Function) in this context is shorthand for CR, LF and CRLF. By contrast, the two Unicode characters have unambiguous uses.

like image 95
MSalters Avatar answered Sep 21 '22 23:09

MSalters