Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how can i make awk process the BEGIN block for each file it parses?

Tags:

bash

awk

cygwin

i have an awk script that i'm running against a pair of files. i'm calling it like this:

awk -f script.awk file1 file2

script.awk looks something like this:

BEGIN {FS=":"}
{ if( NR == 1 )
    { 
      var=$2
      FS=" "
    }
   else print var,"|",$0
}

the first line of each file is colon-delimited. for every other line, i want it to return to the default whitespace file seperator.

this works fine for the first file, but fails because FS is not reset to : after each file, because the BEGIN block is only processed once.

tldr: is there a way to make awk process the BEGIN block once for each file i pass it?

i'm running this on cygwin bash, in case that matters.

like image 625
nullrevolution Avatar asked Sep 13 '12 15:09

nullrevolution


People also ask

How do I read a file line by line in awk?

Using getline from a Pipe You can pipe the output of a command into getline , using ` command | getline' . In this case, the string command is run as a shell command and its output is piped into awk to be used as input. This form of getline reads one record at a time from the pipe.

What does 1 mean in awk?

1 means to print every line. The awk statement is same as writing: awk -F"=" '{OFS="=";gsub(",",";",$2);print $0;}' Copy link CC BY-SA 3.0.

How do you declare variables in awk?

`awk` command uses '-v' option to define the variable. In this example, the myvar variable is defined in the `awk` command to store the value, “AWK variable” that is printed later. Run the following command from the terminal to check the output.

What is awk good for?

Awk is mostly used for pattern scanning and processing. It searches one or more files to see if they contain lines that matches with the specified patterns and then perform the associated actions. Awk is abbreviated from the names of the developers – Aho, Weinberger, and Kernighan.


1 Answers

If you're using gawk version 4 or later there's the BEGINFILE block. From the manual:

BEGINFILE and ENDFILE are additional special patterns whose bodies are executed before reading the first record of each command line input file and after reading the last record of each file. Inside the BEGINFILE rule, the value of ERRNO will be the empty string if the file could be opened successfully. Otherwise, there is some problem with the file and the code should use nextfile to skip it. If that is not done, gawk produces its usual fatal error for files that cannot be opened.

For example:

touch a b c
awk 'BEGINFILE { print "Processing: " FILENAME }' a b c

Output:

Processing: a
Processing: b
Processing: c

Edit - a more portable way

As noted by DennisWilliamson you can achieve a similar effect with FNR == 1 at the beginning of your script. In addition to this you could change FS from the command-line directly, e.g.:

awk -f script.awk FS=':' file1 FS=' ' file2

Here the FS variable will retain whatever value it had previously.

like image 101
Thor Avatar answered Sep 28 '22 08:09

Thor