Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing Windows newlines on Linux (sed vs. awk)

Tags:

linux

sed

awk

Have some delimited files with improperly placed newline characters in the middle of fields (not line ends), appearing as ^M in Vim. They originate from freebcp (on Centos 6) exports of a MSSQL database. Dumping the data in hex shows \r\n patterns:

$ xxd test.txt | grep 0d0a
0000190: 3932 3139 322d 3239 3836 0d0a 0d0a 7c43

I can remove them with awk, but am unable to do the same with sed.

This works in awk, removing the line breaks completely:

awk 'gsub(/\r/,""){printf $0;next}{print}'

But this in sed does not, leaving line feeds in place:

sed -i 's/\r//g'

where this appears to have no effect:

sed -i 's/\r\n//g'

Using ^M in the sed expression (ctrl+v, ctrl+m) also does not seem to work.

For this sort of task, sed is easier to grok, but I am working on learning more about both. Am I using sed improperly, or is there a limitation?

like image 919
kermatt Avatar asked Jul 27 '12 02:07

kermatt


People also ask

How do you get rid of Windows line endings in Linux?

To convert from Windows to Linux line breaks you can use the tr command and simply remove the \r characters from the file. The -d option tells the tr command to delete a character, and '\r' specifies the character to delete.

How do I change from EOL to Unix in Windows?

Converting using Notepad++ To write your file in this way, while you have the file open, go to the Edit menu, select the "EOL Conversion" submenu, and from the options that come up select "UNIX/OSX Format". The next time you save the file, its line endings will, all going well, be saved with UNIX-style line endings.

How do you match a new line using sed?

Using `sed` to replace \n with a comma By default, every line ends with \n when creating a file. The `sed` command can easily split on \n and replace the newline with any character. Another delimiter can be used in place of \n, but only when GNU sed is used.


4 Answers

You can use the command line tool dos2unix

dos2unix input

Or use the tr command:

tr -d '\r' <input >output

Actually, you can do the file-format switching in vim:

Method A:
:e ++ff=dos
:w ++ff=unix
:e!
Method B:
:e ++ff=dos
:set ff=unix
:w

EDIT

If you want to delete the \r\n sequences in the file, try these commands in vim:

:e ++ff=unix           " <-- make sure open with UNIX format
:%s/\r\n//g            " <-- remove all \r\n
:w                     " <-- save file

Your awk solution works fine. Another two sed solutions:

sed '1h;1!H;$!d;${g;s/\r\n//g}' input
sed ':A;/\r$/{N;bA};s/\r\n//g' input
like image 153
kev Avatar answered Oct 03 '22 07:10

kev


I believe some versions of sed will not recognize \r as a character. However, you can use a bash feature to work around that limitation:

echo $string | sed $'s/\r//'

Here, you let bash replace '\r' with the actual carriage return character inside the $'...' construct before passing that to sed as its command. (Assuming you use bash; other shells should have a similar construct.)

like image 26
chepner Avatar answered Oct 05 '22 07:10

chepner


sed -e 's/\r//g' input_file

This works for me. The difference of -e instead of -i command.

Also I mentioned that see on different platforms behave differently. Mine is:sed --version This is not GNU sed version 4.0

like image 45
Sergiy Dolnyy Avatar answered Oct 04 '22 07:10

Sergiy Dolnyy


Another method

awk 1 RS='\r\n' ORS=
  • set Record Separator to \r\n
  • set Output Record Separator to empty string
  • 1 is always true, and in the absence of an action block {print} is used
like image 36
Zombo Avatar answered Oct 01 '22 07:10

Zombo