Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does awk support dynamic user-defined variables?

Tags:

bash

dynamic

awk

awk supports this:

awk '{print $(NF-1);}'

but not for user-defined variables:

awk '{a=123; b="a"; print $($b);}'

by the way, shell supports this:

a=123;
b="a";
eval echo \${$b};

How can I achieve my purpose in awk?

like image 603
fanlix Avatar asked Aug 09 '12 09:08

fanlix


1 Answers

OK, since some of us like to eat spaghetti through their nose, here is some actual code that I wrote in the past :-)
First of all, getting a self modifying code in a language that does not support it will be extremely non-trivial.

The idea to allow dynamic variables, function names, in a language that does not support one is very simple. At some state in the program, you want a dynamic anything to self modify your code, and resume execution from where you left off. a eval(), that is.

This is all very trivial, if the language supports eval() and such equlavant. However, awk does not have such function. Therefore, you, the programmer has to provide a interface to such thing.

To allow all this to happen, you have three main problems

  1. How to get our self so we can modify it
  2. How to load the modified code, and resume from where we left off
  3. Finding a way for the interpreter to accept our modified code

How to get our self so we can modify it

Here is a example code, suitable for direct execution. This one is the infastrucure that I inject for enviroments running gawk, as it requires PROCINFO

echo ""| awk '
function push(d){stack[stack[0]+=1]=d;}
function pop(){if(stack[0])return stack[stack[0]--];return "";}
function dbg_printarray(ary , x , s,e, this , i ){
 x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function dbg_argv(A ,this,p){
 A[0]=0;p="/proc/"PROCINFO["pid"]"/cmdline";push(RS);RS=sprintf("%c",0);
 while((getline v <p)>0)A[A[0]+=1]=v;RS=pop();close(p);}
{
    print "foo";
    dbg_argv(A);
    dbg_printarray(A);
    print "bar";
}'

Result:

foo
A[1]=[awk]
A[2]=[
function push(d){stack[stack[0]+=1]=d;}
function pop(){if(stack[0])return stack[stack[0]--];return "";}
function dbg_printarray(ary , x , s,e, this , i ){
 x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function dbg_argv(A ,this,p){
 A[0]=0;p="/proc/"PROCINFO["pid"]"/cmdline";push(RS);RS=sprintf("%c",0);
 while((getline v <p)>0)A[A[0]+=1]=v;RS=pop();close(p);}
{
print "foo";
dbg_argv(A);
dbg_printarray(A);
print "bar";
}]
bar

As you can see, as long as the OS does not play with our args, and /proc/ is available, it is possible to read our self. This may appear useless at first, but we need it for push/pop of our stack, so that our execution state can be enbedded within the code, so we can save/resume and survive OS shutdown/reboots

I have left out the OS detection function and the bootloader (written in awk), because, if I publish that, kids can build platform independent polynormal code, and it is easy to cause havoc with it.

how to load the modified code, and resume from where we left off

Now, normaly you have push() and pop() for registers, so you can save your state and play with your self, and resume from where you left off. a Call and reading your stack is a typical way to get the memory address.

Unfortunetly, in awk, under normal situations we can not use pointers (with out a lot of dirty work), or registers (unless you can inject other stuff along the way). However you need a way to suspend and resume from your code.

The idea is simple. Instead of letting awk in control of your loops and while, if else conditions, recrusion depth, and functions you are in, the code should. Keep a stack, list of variable names, list of function names, and manage it your self. Just make sure that your code always calls self_modify( bool ) constantly, so that even upon sudden failure, As soon as the script is re-run, we can enter self_modify( bool ) and resume our state. When you want to self modify your code, you must provide a custom made write_stack() and read_stack() code, that writes out the state of stack as string, and reads string from the values out from the code embedded string itself, and resume the execution state.

Here is a small piece of code that demonstrates the whole flow

echo ""| awk '
function push(d){stack[stack[0]+=1]=d;}
function pop(){if(stack[0])return stack[stack[0]--];return "";}
function dbg_printarray(ary , x , s,e, this , i ){
 x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function _(s){return s}
function dbg_argv(A ,this,p){
 A[0]=0;p="/proc/"PROCINFO["pid"]"/cmdline";push(RS);RS=sprintf("%c",0);
 while((getline v <p)>0)A[A[0]+=1]=v;RS=pop();close(p);}
{
    _(BEGIN_MODIFY"|");print "#foo";_("|"END_MODIFY)
    dbg_argv(A);
    sub( \
    "BEGIN_MODIFY\x22\x5c\x7c[^\x5c\x7c]*\x5c\x7c\x22""END_MODIFY", \
    "BEGIN_MODIFY\x22\x7c\x22);print \"#"PROCINFO["pid"]"\";_(\x22\x7c\x22""END_MODIFY" \
     ,A[2]) 
    print "echo \x22\x22\x7c awk \x27"A[2]"";
    print "function bar_"PROCINFO["pid"]"_(s){print \x22""doe\x22}";
    print "\x27"
}'

Result:

Exactly same as our original code, except

_(BEGIN_MODIFY"|");print "65964";_("|"ND_MODIFY)

and

function bar_56228_(s){print "doe"}

at the end of code

Now, this may seem useless, as we are only replaceing code print "foo"; with our pid. But it becomes usefull, when there are multiple _() with separate MAGIC strings to identify BLOCKS, and a custome made multi line string replacement routine instead of sub()

You msut provide BLOCKS for stack, function list, execution point, as a bare minimum.

And notice that the last line contains bar This it self is just a sting, but when this code repeatedly gets executed, notice that

function bar_56228_(s){print "doe"}
function bar_88128_(s){print "doe"}
...

and it keeps growing. While the example is intentionally made so that it does nothing useful, if we provide a routine to call bar_pid_(s) instead of that print "foo" code, Sudenly it means we have eval() on our hands :-) Now, isn't eval() usefull :-)

Don't forget to provide a custome made remove_block() function so that the code maintains a reasonable size, instead of growing every time you execute.

Finding a way for the interpreter to accept our modified code

Normally calling a binary is trivial. However, when doing so from with in awk, it becomes difficult. You may say system() is the way.

There are two problems to that.

  1. system() may not work on some envoroments
  2. it blocks while you are executing code, trus you can not perform recrusive calls and keep the user happy at the same time.

If you must use system(), ensure that it does not block. A normal call to system("sleep 20 && echo from-sh & ") will not work. The solution is simple,

echo ""|awk '{print "foo";E="echo ep ; sleep 20 && echo foo & disown ; ";  E | getline v;close(E);print "bar";}'

Now you have a async system() call that does not block :-)

like image 108
GreenFox Avatar answered Nov 15 '22 06:11

GreenFox