pre-populate associative array keys in awk?

Question

I've written a munin plugin that uses slurm's sacct to monitor job states on a HPC cluster. I've written it in sh + awk (rather than my usual tool of choice, perl).

The script works, but it took me ages to figure out how to pre-populate the associative array of possible states (some/most may not be present in sacct output, and i want them to default to zero). Google wasn't much help, and the best I could come up with was to use split on a string to produce a temporary array, which I then iterated over.

I came up with this:

BEGIN {
    num = split("cancelled completed completing failed nodefail pending running suspended timeout",statenames," ");
    for (i=1;i<=num;i++) {
        states[statenames[i]] = 0
    }
  }

This works, but seems clumsy compared to how i'd do it in perl, like this:

foreach (qw(cancelled completed completing failed nodefail pending running suspended timeout)) {
    $states{$_} = 0;
}

or this

%states = map {$_ => 0} qw(cancelled completed completing failed nodefail pending running suspended timeout);

my question is: is there a way of doing this in awk that is similar to either of the perl versions?

[ edited ]

to clarify, here's a sample of the sacct output i'm piping into awk. Note that the only states in this output are RUNNING, COMPLETED, and CANCELLED - the others don't exist (because they haven't occurred today), but i want them in my script's output anyway (in a form usable by munin as "statename.value 0").

# sacct -X -P -o 'state' -n
RUNNING
RUNNING
RUNNING
RUNNING
COMPLETED
RUNNING
COMPLETED
RUNNING
COMPLETED
COMPLETED
CANCELLED by 1000
COMPLETED

[ edited again ]

and here's sample output from my munin plugin:

# ./slurm-sacct
suspended.value 0
pending.value 0
nodefail.value 0
failed.value 0
running.value 6
completing.value 0
completed.value 5
timeout.value 0
cancelled.value 1

The script runs and does what I want, I just wanted to know if there was a better way to initialise the associative array.

David Z · Accepted Answer

You probably don't need to do it at all. Variables in awk are dynamic, which means they're automatically initialized when they are first used (either assigned to or accessed), and this applies to array elements as well.

A variable will be initialized to 0 if it's accessed in a numeric context, or to the empty string otherwise. (At least gawk does this, though I'm not sure if it's implementation-dependent) So if you're doing something like counting the number of jobs that are in each state, the entire program is as simple as something like

{ states[$1]++ }
END {
     for (state in states) print state, states[state]
}

Each time the expression states[$1]++ is executed, it will check for the existence of states[$1] and initialize it to 0 if it doesn't already exist.

EDIT: From your comment I'm guessing you want to print out a line for each possible state, regardless of whether there are any jobs in that state or not. In that case, you need to include all the possible state names, and there is no shortcut notation for doing so as there is in Perl. As far as I know, what you've already found is about as clean as it gets. (Awk is not really designed with that usage in mind)

I'd suggest the following:

{ states[$1]++ }
END {
     split("cancelled completed completing failed nodefail pending running suspended timeout",statenames," ");
     for (state in statenames) print state, states[state]+0
}

Theo · Answer

Perhaps Craig can use instead of :

print "Timeout states ",states[timeout],".";

this:

print "Timeout states ",int(states[timeout]),".";

In my case if there is no timeout state in awk input, the first print will give:

Timeout states .

While the second will give:

Timeout states 0.

pre-populate associative array keys in awk?

Tags:

awk

associative-array

cas

2 Answers

David Z

Theo

Recent Activity

Donate For Us

pre-populate associative array keys in awk?

Tags:

awk

associative-array

cas

2 Answers

David Z

Theo

Related questions

Recent Activity

Donate For Us