I found the second answer of Parsing FIX protocol in regex? to be very nice so I tried it out.
Here is my code.
new_order_finder1 = re.compile("(?:^|\x01)(11|15|55)=(.*?)\x01")
new_order_finder2 = re.compile("(?:^|\x01)(15|55)=(.*?)\x01")
new_order_finder3 = re.compile("(?:^|\x01)(11|15|35|38|54|55)=(.*?)\x01")
if __name__ == "__main__":
line = "20150702-05:36:08.687 : 8=FIX.4.2\x019=209\x0135=D\x0134=739\x0149=PINE\x0152=20150702-05:36:08.687\x0156=CSUS\x011=KI\x0111=N09080243\x0115=USD\x0121=2\x0122=5\x0138=2100\x0140=2\x0144=126\x0148=AAPL.O\x0154=1\x0155=AAPL.O\x0157=DMA\x0158=TEXT\x0160=20150702-05:36:08.687\x01115=Tester\x016061=9\x0110=087\x01"
fields = dict(re.findall(new_order_finder1, line))
print(fields)
fields2 = dict(re.findall(new_order_finder2, line))
print(fields2)
fields3 = dict(re.findall(new_order_finder3, line))
print(fields3)
Here is the output
{'11': 'N09080243', '55': 'AAPL.O'}
{'55': 'AAPL.O', '15': 'USD'}
{'35': 'D', '38': '2100', '11': 'N09080243', '54': '1'}
It looks like some of the fields are not properly matched by regex.
What's the problem here?
The problem is due to the \x01
at the end consuming the \x01
separator, which causes the pattern to always fail on the key-value pair adjacent to one just matched, since none of the (?:^|\x01)
can match.
Using this substring of your input as example, matching against new_order_finder3
:
\x0154=1\x0155=AAPL.O\x01
------------
X
As you can see, after it manages to match the key-value pair 54=1
, it also consumes \x01
and the adjacent key-value pair can never be matched.
There are more than one method to resolve this issue. One solution is to place the \x01
at the end in a look-ahead assertion, so that we can make sure that \x01
ends the key-value pair without consuming it:
new_order_finder3 = re.compile("(?:^|\x01)(11|15|35|38|54|55)=(.*?)(?=\x01)")
The output now contains all the expected fields:
{'11': 'N09080243', '38': '2100', '15': 'USD', '55': 'AAPL.O', '54': '1', '35': 'D'}
The trailing \x01
is consuming stuff that you wanted to match. The regex matcher will proceed with the next match after the previous thing which matched.
With a lookahead, the fix is easy. Just replace the final \x01
with (?=\x01)
.
import re
new_order_finder3 = re.compile("(?:^|\x01)(11|15|35|38|54|55)=(.*?)(?=\x01)")
if __name__ == "__main__":
line = "20150702-05:36:08.687 : 8=FIX.4.2\x019=209\x0135=D\x0134=739\x01"\
"49=PINE\x0152=20150702-05:36:08.687\x0156=CSUS\x011=KI\x01" \
"11=N09080243\x0115=USD\x0121=2\x0122=5\x0138=2100\x0140=2\x01" \
"44=126\x0148=AAPL.O\x0154=1\x0155=AAPL.O\x0157=DMA\x0158=TEXT\x01" \
"60=20150702-05:36:08.687\x01115=Tester\x016061=9\x0110=087\x01"
fields3 = dict(re.findall(new_order_finder3, line))
print(fields3)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With