Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CR vs LF perl parsing

Tags:

perl

I have a perl script which parses a text file and breaks it up per line into an array. It works fine when each line are terminated by LF but when they terminate by CR my script is not handling properly. How can I modify this line to fix this

my @allLines = split(/^/, $entireFile);

edit: My file has a mixture of lines with either ending LF or ending CR it just collapses all lines when its ending in CR

like image 486
user391986 Avatar asked Sep 23 '11 20:09

user391986


3 Answers

Perl can handle both CRLF and LF line-endings with the built-in :crlf PerlIO layer:

open(my $in, '<:crlf', $filename);

will automatically convert CRLF line endings to LF, and leave LF line endings unchanged. But CR-only files are the odd-man out. If you know that the file uses CR-only, then you can set $/ to "\r" and it will read line-by-line (but it won't change the CR to a LF).

If you have to deal with files of unknown line endings (or even mixed line endings in a single file), you might want to install the PerlIO::eol module. Then you can say:

open(my $in, '<:raw:eol(LF)', $filename);

and it will automatically convert CR, CRLF, or LF line endings into LF as you read the file.

Another option is to set $/ to undef, which will read the entire file in one slurp. Then split it on /\r\n?|\n/. But that assumes that the file is small enough to fit in memory.

like image 152
cjm Avatar answered Oct 09 '22 09:10

cjm


If you have mixed line endings, you can normalize them by matching a generalized line ending:

 use v5.10;

 $entireFile =~ s/\R/\n/g;

You can also open a filehandle on a string and read lines just like you would from a file:

 open my $fh, '<', \ $entireFile;
 my @lines = <$fh>;
 close $fh;

You can even open the string with the layers that cjm shows.

like image 40
brian d foy Avatar answered Oct 09 '22 07:10

brian d foy


You can probably just handle the different line endings when doing the split, e.g.:

my @allLines = split(/\r\n|\r|\n/, $entireFile);
like image 1
Michał Wojciechowski Avatar answered Oct 09 '22 09:10

Michał Wojciechowski