Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression Matching Stock Ticker

I'm having trouble matching stock tickers in a string of text. I want a regular expression to match a space , 3 uppercase letters, and finally a space, period, OR question mark.

Below is the sample pattern that I created.

> `example = 'These are the tickers that I am trying to find: FAB. APL APL? GJA ADJ AKE EBY ZKE SPR TYL'

re.findall('[ ][A-Z]{3}[ .!?]',example)`

The regular expression misses quite a few of the matches.

like image 594
chris302107 Avatar asked Dec 29 '17 15:12

chris302107


1 Answers

If you notice, there's a pattern to which items are missed. It's most obvious in the long section of non-punctuated symbols: it misses every other item.

This is because re.findall() finds non-overlapping matches, and your pattern is matching both the space before and after each match. That means after one item is matched, the initial space for the next item has already been gobbled up and cannot be used again.

Use word boundaries (\b) instead of matching leading/trailing spaces, and make your character class optional:

>>> re.findall(r'\b[A-Z]{3}\b[.!?]?',example)
['FAB.', 'APL', 'APL?', 'GJA', 'ADJ', 'AKE', 'EBY', 'ZKE', 'SPR', 'TYL']
like image 81
glibdud Avatar answered Sep 28 '22 02:09

glibdud