I have a file called domain which contains some domains. For example: <pre class="prettyprint"><code>google.com facebook.com ... yahoo.com </code></pre> And I have another file called site which contains some sites URLs and numbers. For example: <pre class="prettyprint"><code>image.google.com 10 map.google.com 8 ... photo.facebook.com 22 game.facebook.com 15 .. </code></pre> Now I'm going to count the url number each domain has. For example: google.com has 10+8. So I wrote an awk script like this: <pre class="prettyprint"><code>BEGIN{ while(getline dom < "./domain" > 0) { domain[dom]=0; } for(dom in domain) { while(getline < "./site" > 0) { if($1 ~/$dom$) #if $1 end with $dom { domain[dom]+=$2; } } } } </code></pre> But the code <code>if($1 ~/$dom$)</code> doesn't run like I want. Because the variable $dom in the regular expression was explained literally. So, the first question is: Is there any way to use variable <code>$dom</code> in a regular expression? Then, as I'm new to writing script Is there any better way to solve the problem I have?

<code>awk</code> can match against a variable if you don't use the <code>//</code> regex markers. <code>if ( $0 ~ regex ){ print $0; }</code> In this case, build up the required regex as a string <pre class="prettyprint"><code>regex = dom"$" </code></pre> Then match against the <code>regex</code> variable <pre class="prettyprint"><code>if ( $1 ~ regex ) { domain[dom]+=$2; } </code></pre>

First of all, the variable is <code>dom</code> not <code>$dom</code> -- consider <code>$</code> as an operator to extract the value of the column number stored in the variable <code>dom</code> Secondly, awk will not interpolate what's between <code>//</code> -- that is just a string in there. You want the <code>match()</code> function where the 2nd argument can be a string that is treated as the regular expression: <pre class="prettyprint"><code>if (match($1, dom "$")) {...} </code></pre> I would code a solution like: <pre class="prettyprint"><code>awk ' FNR == NR {domain[$1] = 0; next} { for (dom in domain) { if (match($1, dom "$")) { domain[dom] += $2 break } } } END {for (dom in domain) {print dom, domain[dom]}} ' domain site </code></pre>

How to use awk variables in regular expressions?

Tags:

regex

awk

I have a file called domain which contains some domains. For example:

google.com facebook.com ... yahoo.com

And I have another file called site which contains some sites URLs and numbers. For example:

image.google.com   10 map.google.com     8 ... photo.facebook.com  22 game.facebook.com   15 ..

Now I'm going to count the url number each domain has. For example: google.com has 10+8. So I wrote an awk script like this:

BEGIN{   while(getline dom < "./domain" > 0) {     domain[dom]=0;   }   for(dom in domain) {     while(getline < "./site" > 0) {       if($1 ~/$dom$)   #if $1 end with $dom {         domain[dom]+=$2;       }     }   } }

But the code if($1 ~/$dom$) doesn't run like I want. Because the variable $dom in the regular expression was explained literally. So, the first question is:

Is there any way to use variable $dom in a regular expression?

Then, as I'm new to writing script

Is there any better way to solve the problem I have?

851

asked Jul 18 '12 04:07

Hancy

2 Answers

awk can match against a variable if you don't use the // regex markers.

if ( $0 ~ regex ){ print $0; }

In this case, build up the required regex as a string

regex = dom"$"

Then match against the regex variable

if ( $1 ~ regex ) {   domain[dom]+=$2; }

100

answered Sep 18 '22 05:09

Matt

First of all, the variable is dom not $dom -- consider $ as an operator to extract the value of the column number stored in the variable dom

Secondly, awk will not interpolate what's between // -- that is just a string in there.

You want the match() function where the 2nd argument can be a string that is treated as the regular expression:

if (match($1, dom "$")) {...}

I would code a solution like:

awk '   FNR == NR {domain[$1] = 0; next}   {     for (dom in domain) {       if (match($1, dom "$")) {         domain[dom] += $2         break       }     }   }   END {for (dom in domain) {print dom, domain[dom]}} ' domain site

answered Sep 22 '22 05:09

glenn jackman

Related questions
                            
                                Difference between \r and \n
                            
                                How to remove duplicate white spaces in a string? [duplicate]
                            
                                Add http(s) to URL if it's not there?
                            
                                How to remove empty lines from a formatted string
                            
                                Change foreign characters to their roman equivalent
                            
                                Using regular expressions to validate a numeric range
                            
                                How to pull the file name from a url using javascript/jquery?
                            
                                numbers not allowed (0-9) - Regex Expression in javascript
                            
                                Regex for password PHP [duplicate]
                            
                                Determining whether a regex is a subset of another
                            
                                Is Regex instance thread safe for matches in C#
                            
                                How do you sort a range of lines by length?
                            
                                How can I exclude some characters from a class?
                            
                                How can I match on, but exclude a regex pattern?
                            
                                Atom Editor: RegEx replace to uppercase/lowercase
                            
                                How to filter multiple words in Android Studio logcat
                            
                                Dynamic vs Inline RegExp performance in JavaScript
                            
                                What is '?-mix' in a Ruby Regular Expression
                            
                                How can I validate US Social Security Number?
                            
                                Replace chars if not match

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With