Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dejustify (unjustify): replace each *single* linefeed with a space, but don't touch groups of linefeeds. sed, awk, or something else?

I'm looking for a linux command-line solution to the problem:

"Replace each single linefeed with a space, but don't modify any groups of consecutive linefeeds i.e. do not modify any linefeed which has another linefeed next to it." As an example:

one two
three four
five six

seven eight
nine ten

should become:

one two three four five six

seven eight nine ten

I am already aware that every valid text file should end with a linefeed, but if your proposed solution deletes that final-character-linefeed, that would not be a problem (it would be easy for me to append it back on afterwards).

I think that this is "too complex a task" for tr, but I assume something should be possible in sed or awk (if not, then I'll need to "rustle up" something in python or c). Unfortunately, my sed-fu is weak (as is my awk-fu) - are there any sed/awk black-belts around that could please help me?

I have already found How can I replace each newline (\n) with a space using sed? but of course the suggested answers to that question wipe out my "multiple consecutive linefeeds" (which I want to preserve).

I am also aware that "Sed is line-based therefore it is hard for it to grasp newlines" - perhaps sed is not the best tool for this job.

I have also found Replace only single instance of a character using sed but of course the character being replaced in that question is not a (problematic) linefeed.

(Why do I want this? The nano editor has a justify function which adds and removes single linefeeds so that any line "fills" the chosen line length but does not overrun it. nano does have a "built-in" unjustify function, but this is really just an "undo", not a "real" unjustify. What I am trying to find is the closest thing to a "genuine" unjustify command.)

Update: all the current solutions work perfectly, and thank you to all those who provided them. I've accepted Ed Morton's for the reasons that he gives - his processes only 1 line of input at a time, and it's portable to a non-gnu version of its tool. The solution to my nano problem is:

cat << 'EOF' > $HOME/.local/bin/dejustify
#!/bin/sh
awk -v RS= 'NR>1{print ""} {$1=$1} 1' < "${1:-/dev/stdin}"
EOF
chmod u+x $HOME/.local/bin/dejustify

(I found the < "${1:-/dev/stdin}" here.)

I can now use it in a pipeline (printf "one\ntwo\nthree\nfour\n" | dejustify) or just dejustify <filename>.

Inside nano, I can <Ctrl>+<t> then enter |dejustify to dejustify my text. Success! 🙏

like image 782
jaimet Avatar asked Oct 15 '25 04:10

jaimet


2 Answers

Using any awk:

$ awk -v RS= -v ORS='\n\n' -F'\n' '{$1=$1} 1' file
one two three four five six

seven eight nine ten

Breaking it down:

  • -v RS= treat the input as [possibly multi-line] records separated by 1 or more empty lines.
  • -v ORS='\n\n' put 2 newlines at the end of each output record.
  • -F'\n' set the field separator to a newline so that ONLY newlines get replaced in the next step, otherwise all chains of contiguous white space within each record would be replaced.
  • {$1=$1} update the value of a field, $1, thereby causing awk to rebuild the current record replacing all strings that match the FS (a newline) with an OFS (a blank char).
  • 1 a true condition causing awk to execute it's default action of printing the current record.

The above will print a blank line at the end of the output, if that's a problem you can always do this instead:

$ awk -v RS= -F'\n' 'NR>1{print ""} {$1=$1} 1' file
one two three four five six

seven eight nine ten

which prints a blank line before each record except the first instead of printing a blank line after every record:

  • NR>1{print ""} if this is the second or subsequent record then print a blank line before it.
like image 141
Ed Morton Avatar answered Oct 19 '25 13:10

Ed Morton


replace each single linefeed with a space, but don't touch groups of linefeeds

This might be achieved using regular expression with zero-length assertions. I would use perl as there is high chance it is already installed in linux machine. Let file.txt content be

one two
three four
five six

seven eight
nine ten

then

perl -p -0777 -e 's/(?<=[^\n])\n(?=[^\n])/ /g' file.txt

gives output

one two three four five six

seven eight nine ten

Explanation: I engage sed mode (-p -e) with treating whole file as single giant line (-0777), then I substitute every (g like globally) \n after non-newline character and before non-newline character using space. This solution does not remove last newline.

(tested in v5.34.0)

like image 44
Daweo Avatar answered Oct 19 '25 14:10

Daweo