This is an oddball issue I've encountered (and probably have seen before but never paid attention to).
Here's the gist of the code:
my $url = 'http://twitter.com/' . $handle;
my $page = get($url);
if($page =~ m/Web<\/span>\s*<a href=\"(.+?)\"/gi) {
$website = $1;
}
if($page =~ m/follower_count\" class=\"stats_count numeric\">(.+?)\s*</g) {
$num_followers = $1;
}
It gets a twitter url and does a bit of regex to capture the # of followers and the website of the user. This code actually works fine. But when you switch the order and search for the website AFTER you search for follower, website comes up empty. As it turns out, when you regex a string, it seems to sort of save the location of where that last match was made. In the html, the follower count comes up after the website display. If you do the follower count regex first, it's like it starts up the website regex where the follower count left off (like an index reference to the string).
What has me baffled is that i have the "g" operator at the end, signifying "global", as in "search the string globally... from the beginning".
Am I missing something here? I can't seem to figure out why it's resuming the last regex position on the string (if that makes sense).
The /g
modifier, in scalar context, doesn't do what you think it does. Get rid of it.
As perlretut explains, /g
in scalar context cycles over each match in turn. It's designed for use in a loop, like so:
while ($str =~ /pattern/g) {
# match on each occurence of 'pattern' in $str in turn
}
The other way to use /g
is in list context:
my @results = $str =~ /pattern/g; # collect each occurence of 'pattern' within $str into @results
If you're using /g
in scalar context and you're not iterating over it, you're almost certainly not using it right.
To quote perlop on Regexp Quote Like Operators:
In scalar context, each execution of
m//g
finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using thepos()
function; see pos. A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the/c
modifier (e.g.m//gc
). Modifying the target string also resets the search position.
So in scalar context (which you're using), /g
does not mean "search from the beginning", it means "search starting from the string's pos
". "Search from the beginning" is the default (without /g
).
/g
is normally used when you want to find all matches for a regex in a string, instead of just the first match. In list context, it does that by returning a list of all the matches. In scalar context it does that by starting the search from where the previous search left off (usually done in a loop).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With