I have an issue on building a regex and I've searched for 2 days all around Google, Stack Overflow and other documentations...
I have the following lines:
2015-07-08 12:49:07.183852|INFO |VirtualServerBase| 3| client disconnected 'Ròem'(id:6336) reason 'invokerid=20 invokername=Alphonse invokeruid=loremipsum2= reasonmsg=test'
2015-07-08 11:59:23.178055|INFO |VirtualServerBase| 3| client disconnected 'Trakiyen'(id:20460) reason 'invokerid=0 invokername=server reasonmsg=idle time exceeded'
2015-07-08 12:40:50.591450|INFO |VirtualServerBase| 3| client disconnected 'kalash'(id:20464) reason 'invokerid=136 invokername=Charles invokeruid=loremipsum= reasonmsg=Aller, Bisous! bantime=0
2015-07-08 00:23:03.235312|INFO |VirtualServerBase| 3| client disconnected 'Brigata FTW'(id:20451) reason 'invokerid=103 invokername=Bob invokeruid=loremipsum3= reasonmsg=En vous souhaitant une bonne soirée <3 bantime=28800'
I want to match only the first line, following those conditions:
invokername=server
bantime
In that case the result should only match the first line with the following regex:
.*2015-07-08.*client disconnected.*invokername=[^server].*[^bantime=].*
I only write here one regex but I've tried many and many differents things (with ?!
, etc). I've read a lot topics about excluding on Stack Overflow but could not find a solution. I hope someone will help me.
The most simple way to exclude lines with a string or syntax match is by using grep and the -v flag. The output will be the example. txt text file but excluding any line that contains a string match with “ThisWord”.
It's a negative lookahead, which means that for the expression to match, the part within (?!...) must not match. In this case the regex matches http:// only when it is not followed by the current host name (roughly, see Thilo's comment).
You can specify a range of characters by using a hyphen, but if the hyphen appears as the first or last character enclosed in the square brackets, it is taken as a literal hyphen to be included in the character class as a normal character. For example, [^abc] is the same as [^a-c] .
You can get your line with
(?m)^(?!.*\b(?:invokername=server|bantime)\b).*2015-07-08.*client disconnected.*invokername=.*$
See demo
EXPLANATION:
(?m)
- A multiline flag so that ^
and $
could match at the start and end of the sentence.^
- Start of line anchor(?!.*\b(?:invokername=server|bantime)\b)
- A negative look-ahead that is making sure there is no whole words invokername=server
or bantime
further on the line.*2015-07-08.*client disconnected.*invokername=.*
- substring containing 2015-07-08
, client disconnected
, invokername=
and anything can be in-between those substrings (but a linebreak).$
- End of lineAlternatively, you can just match *any line that has no disallowed substrings:
(?m)^(?!.*\b(?:invokername=server|bantime)\b).*$
This is a much better alternative if it does not "overmatch" for you.
Alongside the @llogiq's answer which explained the difference between negated character class and negative look-ahead,you can also use only following regex using negative look ahead :
^((?!bantime|(?:invokername=server)).)*$
See demo https://regex101.com/r/hI5dR0/1
>>> re.search(r'^((?!bantime|(invokername=server)).)*$',s,re.M).group()
"015-07-08 12:49:07.183852|INFO |VirtualServerBase| 3| client disconnected 'R\xc3\xb2em'(id:6336) reason 'invokerid=20 invokername=Alphonse invokeruid=loremipsum2= reasonmsg=test'"
You seem to confuse [^...]
with (?!...)
. The former is a negated character class group, while the latter is a negative lookahead.
If we now also keep in mind that negative lookahead is applied at the current position, we need:
.*?2015-07-08.*?client disconnected.*?(invokername=(?!server))((?!.*?bantime=).*)
Edit: Credit where credit is due: @stribizhev's solution is better than mine:
(?m)^(?!.*\b(?:invokername=server|bantime)\b).*$
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With