Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do line endings differ between Windows and Linux? [closed]

I am trying to parse the Linux /etc/passwd file in Java. I'm currently reading each line through the java.util.Scanner class and then using java.lang.String.split(String) to delimit each line.

The problem is that the line:

list:x:38:38:Mailing List Manager:/var/list:/bin/sh"  

is treated by the scanner as 3 different lines:

  1. list:x:38:38:Mailing
  2. List
  3. Manager...

When I type this out into a new file that I didn't get from Linux, Scanner parses it properly.

Is there something I'm not understanding about new lines in Linux?

Obviously a work around is to parse it without using scanner, but it wouldn't be elegant. Does anyone know of an elegant way to do it?

Is there a way to convert the file into one that would work with Scanner?


Not even two days ago: Historical reason behind different line ending at different platforms

EDIT

Note from the original author:

"I figured out I have a different error that is causing the problem. Disregard question"

like image 371
jbu Avatar asked Jan 08 '09 23:01

jbu


People also ask

Why do Windows and Unix use different line endings?

Unix followed the Multics practice, and later Unix-like systems followed Unix. This created conflicts between Windows and Unix-like OSes, whereby files composed on one OS cannot be properly formatted or interpreted by another OS (for example a UNIX shell script written in a Windows text editor like Notepad).

What line ending does Linux use?

Back to line endings The reasons don't matter: Windows chose the CR/LF model, while Linux uses the \n model.

Can Windows use Unix line endings?

Starting with the current Windows 10 Insider build, Notepad will support Unix/Linux line endings (LF), Macintosh line endings (CR), and Windows Line endings (CRLF) as usual.

What are the different end of line markers for Windows Macintosh and Unix Linux?

Line Breaks in Windows, UNIX & Macintosh Text Files Windows, and DOS before it, uses a pair of CR and LF characters to terminate lines. UNIX (Including Linux and FreeBSD) uses an LF character only. OS X also uses a single LF character, but the classic Mac operating system used a single CR character for line breaks.


1 Answers

From Wikipedia:

  • LF: Multics, Unix and Unix-like systems (GNU/Linux, AIX, Xenix, Mac OS X, FreeBSD, etc.), BeOS, Amiga, RISC OS, and others
  • CR+LF: DEC RT-11 and most other early non-Unix, non-IBM OSes, CP/M, MP/M, DOS, OS/2, Microsoft Windows, Symbian OS
  • CR: Commodore machines, Apple II family, Mac OS up to version 9 and OS-9

I translate this into these line endings in general:

  • Windows: '\r\n'
  • Mac (OS 9-): '\r'
  • Mac (OS 10+): '\n'
  • Unix/Linux: '\n'

You need to make your scanner/parser handle the unix version, too.

like image 50
Michael Haren Avatar answered Sep 28 '22 10:09

Michael Haren