A Bloomberg futures ticker usually looks like:
MCDZ3 Curcny
where the root is MCD
, the month letter and year is Z3
and the 'yellow key' is Curcny
.
Note that the root can be of variable length, 2-4 letters or 1 letter and 1 whitespace (e.g. S H4 Comdty
).
The letter-year allows only the letter listed below in expr
and can have two digit years.
Finally the yellow key can be one of several security type strings but I am interested in (Curncy|Equity|Index|Comdty)
only.
In Matlab I have the following regular expression
expr = '[FGHJKMNQUVXZ]\d{1,2} ';
[rootyk, monthyear] = regexpi(bbergtickers, expr,'split','match','once');
where
rootyk{:}
ans =
'mcd' 'curncy'
and
monthyear =
'z3 '
I don't want to match the ' ' (space) in the monthyear. How can I do?
Assuming there are no leading or trailing whitespaces and only upcase letters in the root, this should work:
^([A-Z]{2,4}|[A-Z]\s)([FGHJKMNQUVXZ]\d{1,2}) (Curncy|Equity|Index|Comdty)$
You've got root in the first group, letter-year in the second, yellow key in the third.
I don't know Matlab nor whether it covers Perl Compatible Regex. If it fails, try e.g. with instead of
\s
. Also, drop the ^...$
if you'd like to extract from a bigger source text.
The expression you're feeding regexpi
with contains a space and is used as a pattern for 'match'
. This is why the matched monthyear
string also has a space1.
If you want to keep it simple and let regexpi
do the work for you (instead of postprocessing its output), try a different approach and capture tokens instead of matching, and ignore the intermediate space:
%// <$1><----------$2---------> <$3>
expr = '(.+)([FGHJKMNQUVXZ]\d{1,2}) (.+)';
tickinfo = regexpi(bbergtickers, expr, 'tokens', 'once');
You can also simplify the expression to a more genereic '(.+)(\w{1}\d{1,2})\s+(.+)'
, if you wish.
bbergtickers = 'MCDZ3 Curncy';
expr = '(.+)([FGHJKMNQUVXZ]\d{1,2})\s+(.+)';
tickinfo = regexpi(bbergtickers, expr, 'tokens', 'once');
The result is:
tickinfo =
'MCD'
'Z3'
'Curncy'
1 This expression is also used as a delimiter for 'split'
. Removing the trailing space from it won't help, as it will reappear in the rootyk
output instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With