Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex to match string excluding word

Tags:

python

regex

I have an issue on building a regex and I've searched for 2 days all around Google, Stack Overflow and other documentations...

I have the following lines:

2015-07-08 12:49:07.183852|INFO    |VirtualServerBase|  3| client disconnected 'Ròem'(id:6336) reason 'invokerid=20 invokername=Alphonse invokeruid=loremipsum2= reasonmsg=test'
2015-07-08 11:59:23.178055|INFO    |VirtualServerBase|  3| client disconnected 'Trakiyen'(id:20460) reason 'invokerid=0 invokername=server reasonmsg=idle time exceeded'
2015-07-08 12:40:50.591450|INFO    |VirtualServerBase|  3| client disconnected 'kalash'(id:20464) reason 'invokerid=136 invokername=Charles invokeruid=loremipsum= reasonmsg=Aller, Bisous! bantime=0
2015-07-08 00:23:03.235312|INFO    |VirtualServerBase|  3| client disconnected 'Brigata FTW'(id:20451) reason 'invokerid=103 invokername=Bob invokeruid=loremipsum3= reasonmsg=En vous souhaitant une bonne soirée <3 bantime=28800'

I want to match only the first line, following those conditions:

  1. No line with invokername=server
  2. No line with bantime

In that case the result should only match the first line with the following regex:

.*2015-07-08.*client disconnected.*invokername=[^server].*[^bantime=].*

I only write here one regex but I've tried many and many differents things (with ?!, etc). I've read a lot topics about excluding on Stack Overflow but could not find a solution. I hope someone will help me.

like image 645
Gladrat Avatar asked Jul 09 '15 12:07

Gladrat


People also ask

How do you exclude a string?

The most simple way to exclude lines with a string or syntax match is by using grep and the -v flag. The output will be the example. txt text file but excluding any line that contains a string match with “ThisWord”.

What is ?! In regex?

It's a negative lookahead, which means that for the expression to match, the part within (?!...) must not match. In this case the regex matches http:// only when it is not followed by the current host name (roughly, see Thilo's comment).

How do you specify in regex?

You can specify a range of characters by using a hyphen, but if the hyphen appears as the first or last character enclosed in the square brackets, it is taken as a literal hyphen to be included in the character class as a normal character. For example, [^abc] is the same as [^a-c] .


3 Answers

You can get your line with

(?m)^(?!.*\b(?:invokername=server|bantime)\b).*2015-07-08.*client disconnected.*invokername=.*$

See demo

EXPLANATION:

  • (?m) - A multiline flag so that ^ and $ could match at the start and end of the sentence.
  • ^ - Start of line anchor
  • (?!.*\b(?:invokername=server|bantime)\b) - A negative look-ahead that is making sure there is no whole words invokername=server or bantime further on the line
  • .*2015-07-08.*client disconnected.*invokername=.* - substring containing 2015-07-08, client disconnected, invokername= and anything can be in-between those substrings (but a linebreak).
  • $ - End of line

Alternatively, you can just match *any line that has no disallowed substrings:

(?m)^(?!.*\b(?:invokername=server|bantime)\b).*$

This is a much better alternative if it does not "overmatch" for you.

like image 168
Wiktor Stribiżew Avatar answered Oct 20 '22 10:10

Wiktor Stribiżew


Alongside the @llogiq's answer which explained the difference between negated character class and negative look-ahead,you can also use only following regex using negative look ahead :

^((?!bantime|(?:invokername=server)).)*$

See demo https://regex101.com/r/hI5dR0/1

>>> re.search(r'^((?!bantime|(invokername=server)).)*$',s,re.M).group()
"015-07-08 12:49:07.183852|INFO    |VirtualServerBase|  3| client disconnected 'R\xc3\xb2em'(id:6336) reason 'invokerid=20 invokername=Alphonse invokeruid=loremipsum2= reasonmsg=test'"
like image 30
Mazdak Avatar answered Oct 20 '22 11:10

Mazdak


You seem to confuse [^...] with (?!...). The former is a negated character class group, while the latter is a negative lookahead.

If we now also keep in mind that negative lookahead is applied at the current position, we need:

.*?2015-07-08.*?client disconnected.*?(invokername=(?!server))((?!.*?bantime=).*)

Edit: Credit where credit is due: @stribizhev's solution is better than mine:

(?m)^(?!.*\b(?:invokername=server|bantime)\b).*$
like image 2
llogiq Avatar answered Oct 20 '22 09:10

llogiq