I have the following lines in 2 chunks (actually there are ~10K of that). And in this example each chunk contain 3 lines. The chunks are separated by an empty line. So the chunks are like "paragraphs".
xox
91-233
chicago
koko
121-111
alabama
I want to turn it into tab-delimited lines, like so:
xox 91-233 chicago
koko 121-111 alabama
How can I do that?
I tried tr "\n" "\t", but it doesn't do what I want.
$ awk -F'\n' '{$1=$1} 1' RS='\n\n' OFS='\t' file
xox 91-233 chicago
koko 121-111 alabama
Awk divides input into records and it divides each record into fields.
-F'\n'
This tells awk to use a newline as the field separator.
$1=$1
This tells awk to assign the first field to the first field. While this seemingly does nothing, it causes awk to treat the record as changed. As a consequence, the output is printed using our assigned value for ORS, the output record separator.
1
This is awk's cryptic shorthand for print the line.
RS='\n\n'
This tells awk to treat two consecutive newlines as a record separator.
OFS='\t'
This tells awk to use a tab as the field separator on output.
This answer offers the following:
* It works with blocks of nonempty lines of any size, separated by any number of empty lines; John1024's helpful answer (which is similar and came first) works with blocks of lines separated by exactly one empty line.
* It explains the awk command used in detail.
A more idiomatic (POSIX-compliant) awk solution:
awk -v RS= -F '\n' -v OFS='\t' '$1=$1""' file
-v RS= tells awk to operate in paragraph mode: consider each run of nonempty lines a single record; RS is the input record separator.
-F '\n' tells awk to consider each line of an input paragraph its own field (breaks the multiline input record into fields by lines); -F sets FS, the input field separator.
-v OFS='\t' tells awk to separate fields with \t (tab chars.) on output; OFS is the output field separator.
$1=$1"" looks like a no-op, but, due to assigning to field variable $1 (the record's first field), tells awk to rebuild the input record, using OFS as the field separator, thereby effectively replacing the \n separators with \t.
"" is to guard against the edge case of the first line in a paragraph evaluating to 0 in a numeric context; appending "" forces treatment as a string, and any nonempty string - even if it contains "0" - is considered true in a Boolean context - see below.Given that $1 is by definition nonempty and given that assignments in awk pass their value through, the result of assignment $1=$1"" is also a nonempty string; since the assignment is used as a pattern (a condition), and a nonempty string is considered true, and there is no associated action block ({ ... }), the implied action is to print the - rebuilt - input record, which now consists of the input lines separated with tabs, terminated by the default output record separator (ORS), \n.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With