Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex to match any character or none?

Tags:

python

regex

I have the two following peices of strings;

line1 = [16/Aug/2016:06:13:25 -0400] "GET /file/ HTTP/1.1" 302 random stuff ignore

line2 = [16/Aug/2016:06:13:25 -0400] "" 400 random stuff ignore

I'm trying to grab these two parts;

"GET /file/ HTTP/1.1" 302
"" 400

Basically any character in between the two "" or nothing in between "". So far I've tried this;

regex_example = re.search("\".+?\" [0-9]{3}", line1)
print regex_example.group()

This will work with line1, but give an error for line2. This is due to the '.' matching any character, but giving an error if no character exists.

Is there any way for it to match any character or nothing in between the two ""?

like image 879
user1165419 Avatar asked Aug 16 '16 19:08

user1165419


People also ask

How do I not match a character in regex?

There's two ways to say "don't match": character ranges, and zero-width negative lookahead/lookbehind. Also, a correction for you: * , ? and + do not actually match anything. They are repetition operators, and always follow a matching operator.

What is the regex for any character?

Matching a Single Character Using Regex By default, the '. ' dot character in a regular expression matches a single character without regard to what character it is. The matched character can be an alphabet, a number or, any special character.

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.

What does \+ mean in regex?

For examples, \+ matches "+" ; \[ matches "[" ; and \. matches "." . Regex also recognizes common escape sequences such as \n for newline, \t for tab, \r for carriage-return, \nnn for a up to 3-digit octal number, \xhh for a two-digit hex code, \uhhhh for a 4-digit Unicode, \uhhhhhhhh for a 8-digit Unicode.


2 Answers

Use .*? instead of .+?.

+ means "1 or more"

* means "0 or more"

Regex101 Demo

If you want a more efficient regex, use a negated character class [^"] instead of a lazy quantifier ?. You should also use the raw string flag r and \d for digits.

r'"[^"]*" \d{3}'
like image 149
4castle Avatar answered Nov 14 '22 02:11

4castle


You can use:

import re

lines = ['[16/Aug/2016:06:13:25 -0400] "GET /file/ HTTP/1.1" 302 random stuff ignore', '[16/Aug/2016:06:13:25 -0400] "" 400 random stuff ignore']

rx = re.compile(r'''
        "[^"]*" # ", followed by anything not a " and a "
        \       # a space
        \d+     # at least one digit
        ''', re.VERBOSE)

matches = [m.group(0) \
            for line in lines \
            for m in rx.finditer(line)]

print(matches)
# ['"GET /file/ HTTP/1.1" 302', '"" 400']


See a demo on ideone.com.
like image 29
Jan Avatar answered Nov 14 '22 02:11

Jan