Have some delimited files with improperly placed newline characters in the middle of fields (not line ends), appearing as ^M in Vim. They originate from freebcp (on Centos 6) exports of a MSSQL database. Dumping the data in hex shows \r\n patterns:
$ xxd test.txt | grep 0d0a
0000190: 3932 3139 322d 3239 3836 0d0a 0d0a 7c43
I can remove them with awk, but am unable to do the same with sed.
This works in awk, removing the line breaks completely:
awk 'gsub(/\r/,""){printf $0;next}{print}'
But this in sed does not, leaving line feeds in place:
sed -i 's/\r//g'
where this appears to have no effect:
sed -i 's/\r\n//g'
Using ^M in the sed expression (ctrl+v, ctrl+m) also does not seem to work.
For this sort of task, sed is easier to grok, but I am working on learning more about both. Am I using sed improperly, or is there a limitation?
To convert from Windows to Linux line breaks you can use the tr command and simply remove the \r characters from the file. The -d option tells the tr command to delete a character, and '\r' specifies the character to delete.
Converting using Notepad++ To write your file in this way, while you have the file open, go to the Edit menu, select the "EOL Conversion" submenu, and from the options that come up select "UNIX/OSX Format". The next time you save the file, its line endings will, all going well, be saved with UNIX-style line endings.
Using `sed` to replace \n with a comma By default, every line ends with \n when creating a file. The `sed` command can easily split on \n and replace the newline with any character. Another delimiter can be used in place of \n, but only when GNU sed is used.
You can use the command line tool dos2unix
dos2unix input
Or use the tr
command:
tr -d '\r' <input >output
Actually, you can do the file-format switching in vim
:
:e ++ff=dos
:w ++ff=unix
:e!
Method B:
:e ++ff=dos
:set ff=unix
:w
If you want to delete the \r\n
sequences in the file, try these commands in vim
:
:e ++ff=unix " <-- make sure open with UNIX format
:%s/\r\n//g " <-- remove all \r\n
:w " <-- save file
Your awk
solution works fine. Another two sed
solutions:
sed '1h;1!H;$!d;${g;s/\r\n//g}' input
sed ':A;/\r$/{N;bA};s/\r\n//g' input
I believe some versions of sed
will not recognize \r
as a character. However, you can use a bash
feature to work around that limitation:
echo $string | sed $'s/\r//'
Here, you let bash
replace '\r' with the actual carriage return character inside the $'...'
construct before passing that to sed
as its command. (Assuming you use bash
; other shells should have a similar construct.)
sed -e 's/\r//g' input_file
This works for me. The difference of -e instead of -i command.
Also I mentioned that see on different platforms behave differently.
Mine is:sed --version
This is not GNU sed version 4.0
Another method
awk 1 RS='\r\n' ORS=
\r\n
1
is always true, and in the absence of an action block {print}
is usedIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With