I have the following lines in 2 chunks (actually there are ~10K of that). And in this example each chunk contain 3 lines. The chunks are separated by an empty line. So the chunks are like "paragraphs".
xox
91-233
chicago
koko
121-111
alabama
I want to turn it into tab-delimited lines, like so:
xox 91-233 chicago
koko 121-111 alabama
How can I do that?
I tried tr "\n" "\t"
, but it doesn't do what I want.
$ awk -F'\n' '{$1=$1} 1' RS='\n\n' OFS='\t' file
xox 91-233 chicago
koko 121-111 alabama
Awk divides input into records and it divides each record into fields.
-F'\n'
This tells awk to use a newline as the field separator.
$1=$1
This tells awk to assign the first field to the first field. While this seemingly does nothing, it causes awk to treat the record as changed. As a consequence, the output is printed using our assigned value for ORS
, the output record separator.
1
This is awk's cryptic shorthand for print the line.
RS='\n\n'
This tells awk to treat two consecutive newlines as a record separator.
OFS='\t'
This tells awk to use a tab as the field separator on output.
This answer offers the following:
* It works with blocks of nonempty lines of any size, separated by any number of empty lines; John1024's helpful answer (which is similar and came first) works with blocks of lines separated by exactly one empty line.
* It explains the awk
command used in detail.
A more idiomatic (POSIX-compliant) awk
solution:
awk -v RS= -F '\n' -v OFS='\t' '$1=$1""' file
-v RS=
tells awk
to operate in paragraph mode: consider each run of nonempty lines a single record; RS
is the input record separator.
-F '\n'
tells awk
to consider each line of an input paragraph its own field (breaks the multiline input record into fields by lines); -F
sets FS
, the input field separator.
-v OFS='\t'
tells awk
to separate fields with \t
(tab chars.) on output; OFS
is the output field separator.
$1=$1""
looks like a no-op, but, due to assigning to field variable $1
(the record's first field), tells awk
to rebuild the input record, using OFS
as the field separator, thereby effectively replacing the \n
separators with \t
.
""
is to guard against the edge case of the first line in a paragraph evaluating to 0
in a numeric context; appending ""
forces treatment as a string, and any nonempty string - even if it contains "0"
- is considered true in a Boolean context - see below.Given that $1
is by definition nonempty and given that assignments in awk
pass their value through, the result of assignment $1=$1""
is also a nonempty string; since the assignment is used as a pattern (a condition), and a nonempty string is considered true, and there is no associated action block ({ ... }
), the implied action is to print the - rebuilt - input record, which now consists of the input lines separated with tabs, terminated by the default output record separator (ORS
), \n
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With