I have the two following peices of strings;
line1 = [16/Aug/2016:06:13:25 -0400] "GET /file/ HTTP/1.1" 302 random stuff ignore
line2 = [16/Aug/2016:06:13:25 -0400] "" 400 random stuff ignore
I'm trying to grab these two parts;
"GET /file/ HTTP/1.1" 302
"" 400
Basically any character in between the two "" or nothing in between "". So far I've tried this;
regex_example = re.search("\".+?\" [0-9]{3}", line1)
print regex_example.group()
This will work with line1, but give an error for line2. This is due to the '.' matching any character, but giving an error if no character exists.
Is there any way for it to match any character or nothing in between the two ""?
There's two ways to say "don't match": character ranges, and zero-width negative lookahead/lookbehind. Also, a correction for you: * , ? and + do not actually match anything. They are repetition operators, and always follow a matching operator.
Matching a Single Character Using Regex By default, the '. ' dot character in a regular expression matches a single character without regard to what character it is. The matched character can be an alphabet, a number or, any special character.
Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.
For examples, \+ matches "+" ; \[ matches "[" ; and \. matches "." . Regex also recognizes common escape sequences such as \n for newline, \t for tab, \r for carriage-return, \nnn for a up to 3-digit octal number, \xhh for a two-digit hex code, \uhhhh for a 4-digit Unicode, \uhhhhhhhh for a 8-digit Unicode.
Use .*?
instead of .+?
.
+
means "1 or more"
*
means "0 or more"
Regex101 Demo
If you want a more efficient regex, use a negated character class [^"]
instead of a lazy quantifier ?
. You should also use the raw string flag r
and \d
for digits.
r'"[^"]*" \d{3}'
You can use:
import re
lines = ['[16/Aug/2016:06:13:25 -0400] "GET /file/ HTTP/1.1" 302 random stuff ignore', '[16/Aug/2016:06:13:25 -0400] "" 400 random stuff ignore']
rx = re.compile(r'''
"[^"]*" # ", followed by anything not a " and a "
\ # a space
\d+ # at least one digit
''', re.VERBOSE)
matches = [m.group(0) \
for line in lines \
for m in rx.finditer(line)]
print(matches)
# ['"GET /file/ HTTP/1.1" 302', '"" 400']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With