Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does [^.]* mean in regular expression?

Tags:

python

regex

I'm trying to get 482.75 from the following text: <span id="yfs_l84_aapl">482.75</span>

The regex I used is: regex = '<span id="yfs_l84_[^.]*">(.+?)</span>' and it worked.

But the thing that I do not understand is why [^.]* can match aapl here? My understanding is that . means any character except a newline; and ^ means negator. So [^.] should be newline and [^.]* should be any number of new lines. However this theory is contrary to real world implementation.

Any help is appreciated and thanks in advance.


The python code I used:

import urllib
import re 
htmlfile = urllib.urlopen("http://finance.yahoo.com/q?s=AAPL&ql=0")
htmltext = htmlfile.read()
regex = '<span id="yfs_l84_[^.]*">(.+?)</span>'
pattern = re.compile(regex)
price = re.findall(pattern, htmltext)
print "the price of of aapl is", price[0]
like image 730
user2830451 Avatar asked Sep 30 '13 08:09

user2830451


2 Answers

Within the [] the . means just a dot. And the leading ^ means "anything but ...".

So [^.]* matches zero or more non-dots.

like image 61
Hans Kesting Avatar answered Oct 16 '22 01:10

Hans Kesting


. dot in a character-matcher just means dot, literally.

Different syntax and special-characters (- dash for range, ^ for negation) apply inside a character-matching specification. Other pattern syntaxes do not apply.

like image 36
Thomas W Avatar answered Oct 16 '22 00:10

Thomas W