I'm having trouble matching stock tickers in a string of text. I want a regular expression to match a space , 3 uppercase letters, and finally a space, period, OR question mark.
Below is the sample pattern that I created.
> `example = 'These are the tickers that I am trying to find: FAB. APL APL? GJA ADJ AKE EBY ZKE SPR TYL'
re.findall('[ ][A-Z]{3}[ .!?]',example)`
The regular expression misses quite a few of the matches.
If you notice, there's a pattern to which items are missed. It's most obvious in the long section of non-punctuated symbols: it misses every other item.
This is because re.findall()
finds non-overlapping matches, and your pattern is matching both the space before and after each match. That means after one item is matched, the initial space for the next item has already been gobbled up and cannot be used again.
Use word boundaries (\b
) instead of matching leading/trailing spaces, and make your character class optional:
>>> re.findall(r'\b[A-Z]{3}\b[.!?]?',example)
['FAB.', 'APL', 'APL?', 'GJA', 'ADJ', 'AKE', 'EBY', 'ZKE', 'SPR', 'TYL']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With