Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing CRLF (0D 0A) from string in Perl

Tags:

regex

perl

I've got a Perl script which consumes an XML file on Linux and occasionally there are CRLF (Hex 0D0A, Dos new lines) in some of the node values which.

The system which produces the XML file writes it all as a single line, and it looks as if it occasionally decides that this is too long and writes a CRLF into one of the data elements. Unfortunately there's nothing I can do about the providing system.

I just need to remove these from the string before I process it.

I've tried all sorts of regex replacement using the perl char classes, hex values, all sorts and nothing seems to work.

I've even run the input file through dos2unix before processing and I still can't get rid of the erroneous characters.

Does anyone have any ideas?

Many Thanks,

like image 795
HeHasMoments Avatar asked Jul 02 '10 15:07

HeHasMoments


People also ask

How do I remove a carriage return in Perl?

$str =~ s/\r//g; Carriage returns and linefeeds are removed by combining \r and \n in that order, of course, since that's the order in which they appear in text files, like those that are created on Windows systems.


1 Answers

$output =~ tr/\x{d}\x{a}//d;

These are both whitespace characters, so if the terminators are always at the end, you can right-trim with

$output =~ s/\s+\z//;
like image 170
Greg Bacon Avatar answered Oct 03 '22 03:10

Greg Bacon