I have this gigantic ugly string:
J0000000: Transaction A0001401 started on 8/22/2008 9:49:29 AM J0000010: Project name: E:\foo.pf J0000011: Job name: MBiek Direct Mail Test J0000020: Document 1 - Completed successfully
I'm trying to extract pieces from it using regex. In this case, I want to grab everything after Project Name
up to the part where it says J0000011:
(the 11 is going to be a different number every time).
Here's the regex I've been playing with:
Project name:\s+(.*)\s+J[0-9]{7}:
The problem is that it doesn't stop until it hits the J0000020: at the end.
How do I make the regex stop at the first occurrence of J[0-9]{7}
?
You make it non-greedy by using ". *?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ". *?" . This means that if for instance nothing comes after the ".
The standard quantifiers in regular expressions are greedy, meaning they match as much as they can, only giving back as necessary to match the remainder of the regex. By using a lazy quantifier, the expression tries the minimal match first.
would specify that 'a' is followed optionally by a 'b'. The full stop, or period, symbol matches any single character except a NEWLINE. matches zero or more occurrences of any character. This is a powerful regexp, which should be used with caution since it can match more characters than anticipated.
You can use the new Python regex module, which supports overlapping matches.
Make .*
non-greedy by adding '?
' after it:
Project name:\s+(.*?)\s+J[0-9]{7}:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With