Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will setting $/="\R" allow chomp() to work correctly with most files in perl?

Does anyone know for sure if setting $/="\R"; will reliably let chomp() do the correct thing, that is remove whatever end-of-line conventions are on a line?

Specifically, I run scripts on Windows and UNIX and have to process files that come off of the net, and have unknown end-of-line conventions: MS-DOS, UNIX, MacOS < 9, whatever.

I recently stumbled on "\R", but I hadn't seen it before. I think it's new. Well, newer than Perl 5.006. (It's been a while.)

The "\R" claims to do Unicode newlines, as well. I have no way to test this correctly.

Thanks.

-Erik

I was surprised to learn there's actually a "newline" tag in stackoverflow.

like image 205
Erik Bennett Avatar asked Jan 02 '23 00:01

Erik Bennett


2 Answers

Will setting $/='\R' allow chomp() to work correctly with most files in perl?

Setting $/ to '\R' will consider the two-character sequence "\\R" as newline.
Setting $/ to "\R" will result in a warning about an Unrecognized escape.

\R is not a string but has a meaning only in the context of regular expressions. But the documentation for $/ clearly states:

Remember: the value of $/ is a string, not a regex. awk has to be better for something. :-)

like image 84
Steffen Ullrich Avatar answered Jan 11 '23 14:01

Steffen Ullrich


I created Acme::InputRecordSeparatorIsRegexp a while ago as a joke, but it does provide a workaround for the restriction that $/ cannot be a regular expression. With version 0.04 (just uploaded), you can say

use Acme::InputRecordSeparatorIsRegexp ':all';

open my $fh, '<:irs(\R)', 'file-with-ambiguous-line-endings.txt';
autochomp($fh,1);     # or (tied *$fh)->autochomp(1)
@lines = <$fh>;
...
like image 25
mob Avatar answered Jan 11 '23 14:01

mob