Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the UTF-8 representation of "end of line" in text file

Tags:

java

utf-8

what is the binary representation of "end of line" in UTF-8.

like image 674
Husky Avatar asked Dec 12 '12 09:12

Husky


People also ask

What is the end of line character in a text file?

Windows programs normally use a carriage return followed by a line feed character at the end of each line of a text file. In ASCII, carriage return/line feed is X'0D'/X'0A'.

What is end of line sequence?

The End of Line (EOL) sequence ( 0x0D 0x0A , \r\n ) is actually two ASCII characters, a combination of the CR and LF characters. It moves the cursor both down to the next line and to the beginning of that line.

What is UTF-8 encoded text?

UTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character. This is the meaning of “UTF”, or “Unicode Transformation Format.”

Are .txt files UTF-8?

Most Microsoft Windows text files use "ANSI", "OEM", "Unicode" or "UTF-8" encoding.


1 Answers

There are a bunch:

  • LF: Line Feed, U+000A (UTF-8 in hex: 0A)
  • VT: Vertical Tab, U+000B (UTF-8 in hex: 0B)
  • FF: Form Feed, U+000C (UTF-8 in hex: 0C)
  • CR: Carriage Return, U+000D (UTF-8 in hex: 0D)
  • CR+LF: CR (U+000D) followed by LF (U+000A) (UTF-8 in hex: 0D0A)
  • NEL: Next Line, U+0085 (UTF-8 in hex: C285)
  • LS: Line Separator, U+2028 (UTF-8 in hex: E280A8)
  • PS: Paragraph Separator, U+2029 (UTF-8 in hex: E280A9)

...and probably many more.

The most commonly used ones are LF (*nix), CR+LF (Windows and DOS), and CR (old pre-OSX Mac systems, mostly).

like image 65
T.J. Crowder Avatar answered Sep 23 '22 01:09

T.J. Crowder