Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing FIX message in regex

I found the second answer of Parsing FIX protocol in regex? to be very nice so I tried it out.

Here is my code.

new_order_finder1 = re.compile("(?:^|\x01)(11|15|55)=(.*?)\x01")
new_order_finder2 = re.compile("(?:^|\x01)(15|55)=(.*?)\x01")
new_order_finder3 = re.compile("(?:^|\x01)(11|15|35|38|54|55)=(.*?)\x01")

if __name__ == "__main__":
    line = "20150702-05:36:08.687 : 8=FIX.4.2\x019=209\x0135=D\x0134=739\x0149=PINE\x0152=20150702-05:36:08.687\x0156=CSUS\x011=KI\x0111=N09080243\x0115=USD\x0121=2\x0122=5\x0138=2100\x0140=2\x0144=126\x0148=AAPL.O\x0154=1\x0155=AAPL.O\x0157=DMA\x0158=TEXT\x0160=20150702-05:36:08.687\x01115=Tester\x016061=9\x0110=087\x01"
    fields = dict(re.findall(new_order_finder1, line))
    print(fields)

    fields2 = dict(re.findall(new_order_finder2, line))
    print(fields2)

    fields3 = dict(re.findall(new_order_finder3, line))
    print(fields3)

Here is the output

{'11': 'N09080243', '55': 'AAPL.O'}
{'55': 'AAPL.O', '15': 'USD'}
{'35': 'D', '38': '2100', '11': 'N09080243', '54': '1'}

It looks like some of the fields are not properly matched by regex.

What's the problem here?

like image 969
Johnyy Avatar asked Dec 24 '22 17:12

Johnyy


2 Answers

The problem is due to the \x01 at the end consuming the \x01 separator, which causes the pattern to always fail on the key-value pair adjacent to one just matched, since none of the (?:^|\x01) can match.

Using this substring of your input as example, matching against new_order_finder3:

\x0154=1\x0155=AAPL.O\x01
------------
            X

As you can see, after it manages to match the key-value pair 54=1, it also consumes \x01 and the adjacent key-value pair can never be matched.

There are more than one method to resolve this issue. One solution is to place the \x01 at the end in a look-ahead assertion, so that we can make sure that \x01 ends the key-value pair without consuming it:

new_order_finder3 = re.compile("(?:^|\x01)(11|15|35|38|54|55)=(.*?)(?=\x01)")

The output now contains all the expected fields:

{'11': 'N09080243', '38': '2100', '15': 'USD', '55': 'AAPL.O', '54': '1', '35': 'D'}
like image 171
nhahtdh Avatar answered Jan 08 '23 11:01

nhahtdh


The trailing \x01 is consuming stuff that you wanted to match. The regex matcher will proceed with the next match after the previous thing which matched.

With a lookahead, the fix is easy. Just replace the final \x01 with (?=\x01).

import re

new_order_finder3 = re.compile("(?:^|\x01)(11|15|35|38|54|55)=(.*?)(?=\x01)")

if __name__ == "__main__":
    line = "20150702-05:36:08.687 : 8=FIX.4.2\x019=209\x0135=D\x0134=739\x01"\
        "49=PINE\x0152=20150702-05:36:08.687\x0156=CSUS\x011=KI\x01" \
        "11=N09080243\x0115=USD\x0121=2\x0122=5\x0138=2100\x0140=2\x01" \
        "44=126\x0148=AAPL.O\x0154=1\x0155=AAPL.O\x0157=DMA\x0158=TEXT\x01" \
        "60=20150702-05:36:08.687\x01115=Tester\x016061=9\x0110=087\x01"
    fields3 = dict(re.findall(new_order_finder3, line))
    print(fields3)
like image 45
tripleee Avatar answered Jan 08 '23 12:01

tripleee