Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract root, month letter-year and yellow key from a Bloomberg futures ticker

A Bloomberg futures ticker usually looks like:

MCDZ3 Curcny

where the root is MCD, the month letter and year is Z3 and the 'yellow key' is Curcny.

Note that the root can be of variable length, 2-4 letters or 1 letter and 1 whitespace (e.g. S H4 Comdty). The letter-year allows only the letter listed below in expr and can have two digit years. Finally the yellow key can be one of several security type strings but I am interested in (Curncy|Equity|Index|Comdty) only.

In Matlab I have the following regular expression

expr = '[FGHJKMNQUVXZ]\d{1,2} '; 
[rootyk, monthyear] = regexpi(bbergtickers, expr,'split','match','once');

where

rootyk{:}
ans = 
    'mcd'    'curncy'

and

monthyear = 
    'z3 '

I don't want to match the ' ' (space) in the monthyear. How can I do?

like image 529
Oleg Avatar asked Oct 02 '22 03:10

Oleg


2 Answers

Assuming there are no leading or trailing whitespaces and only upcase letters in the root, this should work:

^([A-Z]{2,4}|[A-Z]\s)([FGHJKMNQUVXZ]\d{1,2}) (Curncy|Equity|Index|Comdty)$

You've got root in the first group, letter-year in the second, yellow key in the third.

I don't know Matlab nor whether it covers Perl Compatible Regex. If it fails, try e.g. with instead of \s. Also, drop the ^...$ if you'd like to extract from a bigger source text.

like image 154
svoop Avatar answered Oct 07 '22 21:10

svoop


The expression you're feeding regexpi with contains a space and is used as a pattern for 'match'. This is why the matched monthyear string also has a space1.

If you want to keep it simple and let regexpi do the work for you (instead of postprocessing its output), try a different approach and capture tokens instead of matching, and ignore the intermediate space:

%//     <$1><----------$2---------> <$3>
expr = '(.+)([FGHJKMNQUVXZ]\d{1,2}) (.+)';
tickinfo = regexpi(bbergtickers, expr, 'tokens', 'once');

You can also simplify the expression to a more genereic '(.+)(\w{1}\d{1,2})\s+(.+)', if you wish.

Example

bbergtickers = 'MCDZ3 Curncy';
expr = '(.+)([FGHJKMNQUVXZ]\d{1,2})\s+(.+)'; 
tickinfo = regexpi(bbergtickers, expr, 'tokens', 'once');

The result is:

tickinfo =
    'MCD'
    'Z3'
    'Curncy'

1 This expression is also used as a delimiter for 'split'. Removing the trailing space from it won't help, as it will reappear in the rootyk output instead.

like image 37
Eitan T Avatar answered Oct 07 '22 21:10

Eitan T