Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing variable to awk and using that in a regular expression

Tags:

I'm learning awk and I have trouble passing a variable to the script AND using it as part of a regex search pattern.

The example is contrived but shows my probem.

My data is the following:

Eddy        Smith       0600000000  1981-07-16    Los Angeles
Frank       Smith       0611111111  1947-04-29    Chicago           
Victoria    McSmith     0687654321  1982-12-16    Los Angeles
Barbara     Smithy      0633244321  1984-06-24    Boston            
Jane        McSmithy    0612345678  1947-01-15    Chicago               
Grace       Jones       0622222222  1985-10-07    Los Angeles
Bernard     Jones       0647658763  1988-01-01    New York          
George      Jonesy      0623428948  1983-01-01    New York          
Indiana     McJones     0698732298  1952-01-01    Miami             
Philip      McJonesy    0644238523  1954-01-01    Miami

I want an awk script that I can pass a variable and then have the awk script do a regex for the variable. I've got this script now called "003_search_persons.awk".

#this awk script looks for a certain name, returns firstName, lastName and City

#print column headers
BEGIN {
    printf "firstName lastName City\n";
}

#look for the name, print firstName, lastName and City
$2 ~ name {
    printf $1 " " $2 " " $5 " " $6;
    printf "\n";
}

I call the script like this:

awk -f 003_search_persons.awk name=Smith 003_persons.txt

It returns the following, which is good.

firstName lastName City
Eddy Smith Los Angeles
Frank Smith Chicago
Victoria McSmith Los Angeles
Barbara Smithy Boston
Jane McSmithy Chicago

But now I want to look for a certain prefix "Mc". I could ofcourse hardcode this, but I want an awk script that is flexible. I wrote the following in 003_search_persons_prefix.awk.

#this awk script looks for a certain prefix to a name, returns firstName, lastName and City

#print column headers
BEGIN {
    printf "firstName lastName City\n";
}

#look for the prefix, print firstName, lastName and City
/^prefix/{
    printf $1 " " $2 " " $5 " " $6;
    printf "\n";
}

I call the script like this:

awk -f 003_search_persons_prefix.awk prefix=Mc 003_persons.txt

But now it finds no records.

The problem is the search pattern "/^prefix/". I know I can replace that search pattern by a non-regex one, as in the first script, but suppose I want to do it with a regex, because I need the prefix to really be at the start of the lastName field, as it should be, being a prefix and all ;-)

How do I do this?

like image 241
Niels Bom Avatar asked Feb 09 '10 08:02

Niels Bom


People also ask

Can I use regular expression with awk?

In awk, regular expressions (regex) allow for dynamic and complex pattern definitions. You're not limited to searching for simple strings but also patterns within patterns.

Can we pass variable in regex?

If we try to pass a variable to the regex literal pattern it won't work. The right way of doing it is by using a regular expression constructor new RegExp() .

What type of regex does awk use?

A regular expression enclosed in slashes (' / ') is an awk pattern that matches every input record whose text belongs to that set. The simplest regular expression is a sequence of letters, numbers, or both. Such a regexp matches any string that contains that sequence.


2 Answers

you can try this

BEGIN{
 printf "firstName lastName City\n";
 split(ARGV[1], n,"=")
 prefix=n[2]
 pat="^"prefix
}
$0 ~ pat{
    print "found: "$0
}

output

$ awk -f  test.awk name=Jane file
firstName lastName City
found: Jane        McSmithy    0612345678  1947-01-15    Chicago

Look at the awk documentation for more. (and read it from start to finish!)

like image 64
ghostdog74 Avatar answered Jan 03 '23 04:01

ghostdog74


Change your script to:

BEGIN {
    print "firstName", "lastName", "City"
    ORS = "\n\n"
}

$0 ~ "^" prefix {
    print $1, $2, $5, $6
}

and call it as

awk -v prefix="Mc" -f 003_search_persons.awk 003_persons.txt
like image 23
Ed Morton Avatar answered Jan 03 '23 03:01

Ed Morton