There is a method in jQuery datatables library file which constructs a regular expression. Can anyone tell me what does the following regular expression mean -
^(?=.*?il)(?=.*?oh).*$
^
Matches the begging of the input. This matches a position, rather than a character (think of it as the space in between characters).
(?=)
This is called a lookahead. Again, this matches a position. The position it matches is where the text immediately in front of the current position equals the given text, but the "pointer" doesn't move forward. Think of it like peeking ahead without popping.
.*?il
Matches any number of any character (except newlines, by default), followed by the characters "il".
.*?oh
Same as above, except for the characters "oh".
$
Matches the end of the input.
Basically, this regex is checking to see if the input string contains the characters "il" and "oh".
Analogy:
Think of it like this. You have a lineup of people and you step up to the first person (^). You then look ahead one person at a time until you find someone with a red hat, immediately followed by a yellow hat. ((?=.*?il)). Your eyes dart back to the first person in the lineup and you repeat the search, except this time you are looking for a person wearing a purple hat immediately followed by a green hat ((?=.*?oh)). Finally, you walk past all of the people, pulling each person out of the lineup, until you come to the end of the line (.*$). If, at any point, you couldn't find what you were looking for, you would have turned around and left the room (equivalent to returning false). Otherwise, after coming to the end of the lineup, you shout "candy!" (equivalent to returning true).
Point of Interest:
The lookaheads use what's called "non-greedy" quantifiers (*?). This basically says "match as many as you must, but no more". A greedy quantifier (*) says "match as many as you can". If greedy quantifiers had been used, it would be equivalent to moving your eyes to the back of the lineup and then scanning toward the front, stopping at the first match (which would be the last in the lineup, if counting from the front).
If you were to remove the beginning of input anchor (^) then this expression would be vulnerable to catastrophic backtracking. Since the lookahead matches based on a position, if it doesn't match, then it will try to step forward one character and try again. The ^ keeps the lookaheads anchored to the first position in the input. If they can't find what they're looking for from that position, then they'll just fail.
The .*$ part is fluff. You could remove it without affecting the expression (EDIT: Well, actually, that's true if you are simply testing the input. You are using the resulting match, then you need the .* to produce a non-zero-length string). If, however, you want to make sure that the input was a certain length, you use .{5,10}$ instead. This would be like walking through the lineup, counting the number of people you've pulled out, and only yelling "candy!" if you've found at least 5 people but no more than 10 (alternatives: {5,} - at least 5 characters with no upper bound; {0,10} - no more than 10 characters with 0 as lower bound value). Given that you are looking for the characters "il" and "oh" already, there is already an implicit requirement that the input be at least 4 characters (with no upper bound).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With