I have a text file with several columns of text and values. This structure:
CAR 38
DOG 42
CAT 89
CAR 23
APE 18
If column 1 has a String, column 2 doesn't (or it's actually an emptry String). And the other way around: if column 1 is empty, column 2 has a String. In other words, the "object" (CAR, CAT, DOG etc.) occurs in either column 1 or column 2, but never both.
I'm looking for an efficient way to consolidate column 1 and 2 so that the file looks like this instead:
CAR 38
DOG 42
CAT 89
CAR 23
APE 18
I can do this in a Bash script by using while and if, but I'm sure there is a simpler way of doing it. Can someone help?
Cheers! Z
Try this:
column -t file
Output:
CAR 38 DOG 42 CAT 89 CAR 23 APE 18
Note: If:
use Cyrus's simpler, column
-based answer.
See below for how the column
-based approach compares to the awk
-based approach below in terms of performance and resource consumption.
awk
is your friend here:
awk -v OFS=' ' '{ print $1, $2 }' file
awk
splits lines into field by whitespace by default, so, with your input, lines such as CAR 38
and DOG 42
are parsed the same (CAR
and DOG
become field 1, $1
, and 38
and 42
become field 2, $2
).-v OFS=' '
sets the output-field separator to two spaces (default is a single space); note that there'll be no padding of output values to create aligned output.To create aligned output with fields of varying width, use Awk's printf
function, which gives you more control over the output; for instance the following outputs a 10-char-wide left-aligned 1st column, and a 2-char-wide right-aligned 2nd column:
awk '{ printf "%-10s %2s\n", $1, $2 }' file
column -t
conveniently determines the column widths automatically, by parsing all data first, but that has performance and resource-consumption implications; see below.Performance / resource-consumption comparison between the column -t
and the Awk approach:
column -t
needs to analyze all input data up front, in a first pass, so as to be able to determine the maximum input column widths; from what I can tell, it does so by reading the input as a whole into memory first, which can be problematic with large input files.Thus,
column -t
will consume memory proportional to the input size, whereas awk
will use a constant amount of memory.column -t
is typically slower, depending on the Awk implementation used; mawk
is much faster, gawk
a little faster, BSD awk
is slower(!); results based on a 10-million line input file; commands run on OSX 10.10.2 and Ubuntu 14.04.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With