I'm trying to parse out prices but ignore two patterns that are also prices. One of the exclusions is the total price which is at the end which I am using lookahead to ignore. The second exclusion is if there's a variation of the letter Q before a price, for example Q10.00 or Q AWSMSN11.32 but I want to include if there's a three letter alpha that happens to end in Q such as YMQ234.03.
I've added a negative lookbehind but can't seem to get what I want.
This is the pattern I've tried: (?<![Q\d]) ?M?(\d+\.\d{2})(?=.*\d+\.\d{2}END)
test strings
ABC WS YMQ234.03WS TOY234.03USD468.06END
FUR BB LAB Q10.00 199.00USD209.00END
YAS DG TYY Q AWSMSN11.32 2503.08LD VET Q JKLOLE11.32 2503.08USD5028.80END
PPP VP LAP Q10.00 M342.41EE SFD Q10.00 282.24USD644.65END
regex101
Expected output
+---------------------------------------------------------------------------+---------+---------+
| ABC WS YMQ234.03WS TOY234.03USD468.06END                                  | 234.03  | 234.03  |
| FUR BB LAB Q10.00 199.00USD209.00END                                      | 199.00  |         |
| YAS DG TYY Q AWSMSN11.32 2503.08LD VET Q JKLOLE11.32 2503.08USD5028.80END | 2503.08 | 2503.08 |
| PPP VP LAP Q10.00 M342.41EE SFD Q10.00 282.24USD644.65END                 | 342.41  | 282.24  |
+---------------------------------------------------------------------------+---------+---------+
                You could use regex module instead of re with the pattern:
Q[A-Z ]*(?<!\b[A-Z]{2}Q)[\d.]+(*SKIP)(*F)|\d+(?:\.\d+)(?!\d*END$)
See the online demo.
In Python this could look like:
import regex
arr = ['ABC WS YMQ234.03WS TOY234.03USD468.06END', 'FUR BB LAB Q10.00 199.00USD209.00END', 'YAS DG TYY Q AWSMSN11.32 2503.08LD VET Q JKLOLE11.32 2503.08USD5028.80END', 'PPP VP LAP Q10.00 M342.41EE SFD Q10.00 282.24USD644.65END']
res = [regex.findall(r'Q[A-Z ]*(?<!\b[A-Z]{2}Q)[\d.]+(*SKIP)(*F)|\d+(?:\.\d+)(?!\d*END$)',x) for x in arr]
print(res)
Prints:
[['234.03', '234.03'], ['199.00'], ['2503.08', '2503.08'], ['342.41', '282.24']]
                        You might also match what you don't want, and capture what you do want.
Match optional whitespace and uppercase chars where there is a Q and match the decimal value that follows.
Make the exception of eliminating this match asserting that it is not preceded by 2 times an uppercase A-Z followed by Q
After the alternation, capture the decimal value in group 1, asserting that it is not followed by END
\b[A-Z ]*Q[A-Z ]*(?<![A-Z][A-Z]Q)\d+\.\d+|(\d+\.\d{2})(?!END)
Explanation
\b[A-Z ]*Q[A-Z ]* Word boundary, match a Q between optional spaces and uppercase chars(?<![A-Z][A-Z]Q) Negative lookbehind, assert not 2 uppercase chars A-Z followed by Q directly to the left\d+\.\d+ Match a decimal value| Or( Capture group 1
\d+\.\d{2} Match 1+ digits followed by a dot and 2 digits) Close group 1(?!END) Negative lookahead, assert what is directly to the right is not END
Regex demo | Python demo
For example
import re
regex = r"\b[A-Z ]*Q[A-Z ]*(?<![A-Z][A-Z]Q)\d+\.\d+|(\d+\.\d{2})(?!END)"
strings = [
    "ABC WS YMQ234.03WS TOY234.03USD468.06END",
    "FUR BB LAB Q10.00 199.00USD209.00END",
    "YAS DG TYY Q AWSMSN11.32 2503.08LD VET Q JKLOLE11.32 2503.08USD5028.80END",
    "PPP VP LAP Q10.00 M342.41EE SFD Q10.00 282.24USD644.65END"
]
for str in strings:
    print('{}: {}'.format(str, [x.group(1) for x in re.finditer(regex, str) if x.group(1)]))
Output
ABC WS YMQ234.03WS TOY234.03USD468.06END: ['234.03', '234.03']
FUR BB LAB Q10.00 199.00USD209.00END: ['199.00']
YAS DG TYY Q AWSMSN11.32 2503.08LD VET Q JKLOLE11.32 2503.08USD5028.80END: ['2503.08', '2503.08']
PPP VP LAP Q10.00 M342.41EE SFD Q10.00 282.24USD644.65END: ['342.41', '282.24']
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With