I have a text file with several columns of text and values. This structure:
CAR       38
     DOG  42
CAT       89
CAR       23
     APE  18
If column 1 has a String, column 2 doesn't (or it's actually an emptry String). And the other way around: if column 1 is empty, column 2 has a String. In other words, the "object" (CAR, CAT, DOG etc.) occurs in either column 1 or column 2, but never both.
I'm looking for an efficient way to consolidate column 1 and 2 so that the file looks like this instead:
CAR  38
DOG  42
CAT  89
CAR  23
APE  18
I can do this in a Bash script by using while and if, but I'm sure there is a simpler way of doing it. Can someone help?
Cheers! Z
Try this:
column -t file
Output:
CAR 38 DOG 42 CAT 89 CAR 23 APE 18
Note: If:
use Cyrus's simpler, column-based answer.
See below for how the column-based approach compares to the awk-based approach below in terms of performance and resource consumption.
awk is your friend here:
awk -v OFS='  ' '{ print $1, $2 }' file
awk splits lines into field by whitespace by default, so, with your input, lines such as CAR       38 and DOG  42 are parsed the same (CAR and DOG become field 1, $1, and 38 and 42 become field 2, $2).-v OFS='  ' sets the output-field separator to two spaces (default is a single space); note that there'll be no padding of output values to create aligned output.To create aligned output with fields of varying width, use Awk's printf function, which gives you more control over the output; for instance the following outputs a 10-char-wide left-aligned 1st column, and a 2-char-wide right-aligned 2nd column:
awk '{ printf "%-10s  %2s\n", $1, $2 }' file
column -t conveniently determines the column widths automatically, by parsing all data first, but that has performance and resource-consumption implications; see below.Performance / resource-consumption comparison between the column -t and the Awk approach:
column -t needs to analyze all input data up front, in a first pass, so as to be able to determine the maximum input column widths; from what I can tell, it does so by reading the input as a whole into memory first, which can be problematic with large input files.Thus,
column -t will consume memory proportional to the input size, whereas awk will use a constant amount of memory.column -t is typically slower, depending on the Awk implementation used; mawk is much faster, gawk a little faster, BSD awk is slower(!); results based on a 10-million line input file; commands run on OSX 10.10.2 and Ubuntu 14.04.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With