How to remove new lines within double quotes?

How can I remove new line inside the " from a file?

For example:


So I want to remove the \n between the three and four. Should I use regular expression, or I have to read that's file per character with program?

2 Answers

To handle specifically those newlines that are in doubly-quoted strings and leave those alone that are outside them, using GNU awk (for RT):

gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' file

This works by splitting the file along " characters and removing newlines in every other block. With a file containing


this will give the result


Note that it does not handle escape sequences. If strings in the input data can contain \", such as "He said: \"this is a direct quote.\"", then it will not work as desired.

You can print those lines starting with ". If they don't, accumulate its content into a variable and print it later on:

$ awk '/^"/ {if (f) print f; f=$0; next} {f=f FS $0} END {print f}' file
"three four",

Since we are always printing the previous block of text, note the need of END to print the last stored value after processing the full file.

