I'm trying to get 482.75 from the following text: <span id="yfs_l84_aapl">482.75</span>
The regex I used is: regex = '<span id="yfs_l84_[^.]*">(.+?)</span>'
and it worked.
But the thing that I do not understand is why [^.]* can match aapl here? My understanding is that . means any character except a newline; and ^ means negator. So [^.] should be newline and [^.]* should be any number of new lines. However this theory is contrary to real world implementation.
Any help is appreciated and thanks in advance.
The python code I used:
import urllib
import re
htmlfile = urllib.urlopen("http://finance.yahoo.com/q?s=AAPL&ql=0")
htmltext = htmlfile.read()
regex = '<span id="yfs_l84_[^.]*">(.+?)</span>'
pattern = re.compile(regex)
price = re.findall(pattern, htmltext)
print "the price of of aapl is", price[0]
Within the []
the .
means just a dot. And the leading ^
means "anything but ...".
So [^.]*
matches zero or more non-dots.
. dot in a character-matcher just means dot, literally.
Different syntax and special-characters (- dash for range, ^ for negation) apply inside a character-matching specification. Other pattern syntaxes do not apply.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With