I am trying to parse the Linux /etc/passwd
file in Java. I'm currently reading each line through the java.util.Scanner
class and then using java.lang.String.split(String)
to delimit each line.
The problem is that the line:
list:x:38:38:Mailing List Manager:/var/list:/bin/sh"
is treated by the scanner as 3 different lines:
list:x:38:38:Mailing
List
Manager...
When I type this out into a new file that I didn't get from Linux, Scanner
parses it properly.
Is there something I'm not understanding about new lines in Linux?
Obviously a work around is to parse it without using scanner, but it wouldn't be elegant. Does anyone know of an elegant way to do it?
Is there a way to convert the file into one that would work with Scanner
?
Not even two days ago: Historical reason behind different line ending at different platforms
EDIT
Note from the original author:
"I figured out I have a different error that is causing the problem. Disregard question"
Unix followed the Multics practice, and later Unix-like systems followed Unix. This created conflicts between Windows and Unix-like OSes, whereby files composed on one OS cannot be properly formatted or interpreted by another OS (for example a UNIX shell script written in a Windows text editor like Notepad).
Back to line endings The reasons don't matter: Windows chose the CR/LF model, while Linux uses the \n model.
Starting with the current Windows 10 Insider build, Notepad will support Unix/Linux line endings (LF), Macintosh line endings (CR), and Windows Line endings (CRLF) as usual.
Line Breaks in Windows, UNIX & Macintosh Text Files Windows, and DOS before it, uses a pair of CR and LF characters to terminate lines. UNIX (Including Linux and FreeBSD) uses an LF character only. OS X also uses a single LF character, but the classic Mac operating system used a single CR character for line breaks.
From Wikipedia:
- LF: Multics, Unix and Unix-like systems (GNU/Linux, AIX, Xenix, Mac OS X, FreeBSD, etc.), BeOS, Amiga, RISC OS, and others
- CR+LF: DEC RT-11 and most other early non-Unix, non-IBM OSes, CP/M, MP/M, DOS, OS/2, Microsoft Windows, Symbian OS
- CR: Commodore machines, Apple II family, Mac OS up to version 9 and OS-9
I translate this into these line endings in general:
'\r\n'
'\r'
'\n'
'\n'
You need to make your scanner/parser handle the unix version, too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With