Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to join lines not starting with specific pattern to the previous line in UNIX?

Please take a look at the sample file and the desired output below to understand what I am looking for.

It can be done with loops in a shell script but I am struggling to get an awk/sed one liner.

SampleFile.txt

These are leaves.
These are branches.
These are greenery which gives
oxygen, provides control over temperature
and maintains cleans the air.
These are tigers
These are bears
and deer and squirrels and other animals.
These are something you want to kill
Which will see you killed in the end.
These are things you must to think to save your tomorrow.

Desired output

These are leaves.
These are branches.
These are greenery which gives oxygen, provides control over temperature and maintains cleans the air.
These are tigers
These are bears and deer and squirrels and other animals.
These are something you want to kill Which will see you killed in the end.
These are things you must to think to save your tomorrow.
like image 453
instinct246 Avatar asked Jun 21 '16 16:06

instinct246


People also ask

How do you add a character at the beginning of each line in Unix?

The ^ character is what instructs the sed command to add a character to the beginning of each line. Here's the syntax for adding a space to the beginning of each line using sed . Alternatively, use the -i option with the sed command to edit a file in place.

How do you exclude the first line in Unix?

The first line of a file can be skipped by using various Linux commands. As shown in this tutorial, there are different ways to skip the first line of a file by using the `awk` command. Noteably, the NR variable of the `awk` command can be used to skip the first line of any file.

How do you cut a specific line in Unix?

The cut command in UNIX is a command for cutting out the sections from each line of files and writing the result to standard output. It can be used to cut parts of a line by byte position, character and field. Basically the cut command slices a line and extracts the text.

How do you merge two lines in Unix?

The traditional way of using paste command with "-s" option. "-d" in paste can take multiple delimiters. The delimiters specified here are comma and a newline character. This means while joining the first and second line use comma, and the second and third line by a newline character.


2 Answers

With sed:

sed ':a;N;/\nThese/!s/\n/ /;ta;P;D' infile

resulting in

These are leaves.
These are branches.
These are greenery which gives oxygen, provides control over temperature and maintains cleans the air.
These are tigers
These are bears and deer and squirrels and other animals.
These are something you want to kill Which will see you killed in the end.
These are things you must to think to save your tomorrow.

Here is how it works:

sed '
:a                   # Label to jump to
N                    # Append next line to pattern space
/\nThese/!s/\n/ /    # If the newline is NOT followed by "These", append
                     # the line by replacing the newline with a space
ta                   # If we changed something, jump to label
P                    # Print part until newline
D                    # Delete part until newline
' infile

The N;P;D is the idiomatic way of keeping multiple lines in the pattern space; the conditional branching part takes care of the situation where we append more than one line.

This works with GNU sed; for other seds like the one found in Mac OS, the oneliner has to be split up so branching and label are in separate commands, the newlines may have to be escaped, and we need an extra semicolon:

sed -e ':a' -e 'N;/'$'\n''These/!s/'$'\n''/ /;ta' -e 'P;D;' infile

This last command is untested; see this answer for differences between different seds and how to handle them.

Another alternative is to enter the newlines literally:

sed -e ':a' -e 'N;/\
These/!s/\
/ /;ta' -e 'P;D;' infile

But then, by definition, it's no longer a one-liner.

like image 181
Benjamin W. Avatar answered Oct 16 '22 07:10

Benjamin W.


Please try the following:

awk 'BEGIN {accum_line = "";} /^These/{if(length(accum_line)){print accum_line; accum_line = "";}} {accum_line = accum_line " " $0;} END {if(length(accum_line)){print accum_line; }}' < data.txt

The code consists of three parts:

  1. The block marked by BEGIN is executed before anything else. It's useful for global initialization
  2. The block marked by END is executed when the regular processing finished. It is good for wrapping the things. Like printing the last collected data if this line has no These at the beginning (this case)
  3. The rest is the code performed for each line. First, the pattern is searched for and the relevant things are done. Second, data collection is done regardless of the string contents.
like image 23
GMichael Avatar answered Oct 16 '22 06:10

GMichael