Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

vim syntax: match only when between other matches

I am trying to create a syntax file for my logfiles. They take the format:

[time] LEVEL filepath:line - message

My syntax file looks like this:

:syn region logTime start=+^\[+ end=+\] +me=e-1
:syn keyword logCritical CRITICAL skipwhite nextgroup=logFile
:syn keyword logError ERROR skipwhite nextgroup=logFile
:syn keyword logWarn WARN skipwhite nextgroup=logFile
:syn keyword logInfo INFO skipwhite nextgroup=logFile
:syn keyword logDebug DEBUG skipwhite nextgroup=logFile
:syn match logFile " \S\+:" contained nextgroup=logLineNumber
:syn match logLineNumber "\d\+" contained

The issue I have is that if the string ERROR or DEBUG or something occurs within the message, it gets highlighted. But I don't want it to. I want it so that the keywords only get highlighted if they fall immediately after the time and immediately before the filepath.

How is this done?

like image 756
ewok Avatar asked Oct 05 '15 16:10

ewok


1 Answers

Using a test file that looks like this:

[01:23:45] ERROR /foo/bar:42 - this is a log message
[01:23:45] ERROR /foo/bar:42 - this is a ERROR log message
[01:23:45] CRITICAL /foo/bar:42 - this is a log message
[01:23:45] CRITICAL /foo/bar:42 - this is a CRITICAL log message

This syntax file works for me and does not highlight those keywords in the message portion.

" Match the beginning of a log entry. This match is a superset which
" contains other matches (those named in the "contains") parameter.
"
"     ^                   Beginning of line
"     \[                  Opening square bracket of timestamp
"         [^\[\]]\+       A class that matches anything that isn't '[' or ']'
"                             Inside a class, ^ means "not"
"                             So this matches 1 or more non-bracket characters
"                             (in other words, the timestamp itself)
"                             The \+ following the class means "1 or more of these"
"     \]                  Closing square bracket of timestamp
"     \s\+                Whitespace character (1 or more)
"     [A-Z]\+             Uppercase letter (1 or more)
"
" So, this matches the timestamp and the entry type (ERROR, CRITICAL...)
"
syn match logBeginning "^\[[^\[\]]\+\]\s\+[A-Z]\+" contains=logTime,logCritical,logError,logWarn,logInfo,logDebug

" A region that will match the timestamp. It starts with a bracket and
" ends with a bracket. "contained" means that it is expected to be contained
" inside another match (and above, logBeginning notes that it contains logTime).
" The "me" parameter e-1 means that the syntax match will be offset by 1 character
" at the end. This is usually done when the highlighting goes a character too far.
syn region logTime start=+^\[+ end=+\] +me=e-1 contained

" A list of keywords that define which types we expect (ERROR, WARN, etc.)
" These are all marked contained because they are a subset of the first
" match rule, logBeginning.
syn keyword logCritical CRITICAL contained
syn keyword logError ERROR contained
syn keyword logWarn WARN contained
syn keyword logInfo INFO contained
syn keyword logDebug DEBUG contained

" Now that we have taken care of the timestamp and log type we move on
" to the filename and the line number. This match will catch both of them.
"
" \S\+         NOT whitespace (1 or more) - matches the filename
" :            Matches a literal colon character
" \d\+         Digit (1 or more) - matches the line number
syn match logFileAndNumber " \S\+:\d\+" contains=logFile,logLineNumber

" This will match only the log filename so we can highlight it differently
" than the line number.
syn match logFile " \S\+:" contained

" Match only the line number.
syn match logLineNumber "\d\+" contained

Screenshot of vim highlighting

You might be curious why instead of just using various matches, I used contained matches. That's because some matches like \d\+ are too generic to match anywhere in the line and be right - using contained matches they can be grouped together into patterns that are more likely to be correct. In an earlier revision of this syntax file, some example lines were wrong because for instance if "ERROR" showed up in the log entry text later in the line, it would be highlighted. But in this definition, those keywords only match if they are next to a timestamp which shows up at the beginning of the line only. So the containers are a way to match more precisely but also keep the regexes under control as far as length and complexity.

Update: Based on the example lines you provided (noted below), I have improved the regex on the first line above and in my testing, it works properly now.

[2015-10-05 13:02:27,619] ERROR /home/admusr/autobot/WebManager/wm/operators.py:2371 - Failed to fix py rpc info: [Errno 2] No such file or directory: '/opt/.djangoserverinfo'
[2015-10-05 13:02:13,147] ERROR /home/admusr/autobot/WebManager/wm/operators.py:3223 - Failed to get field "{'_labkeys': ['NTP Server'], 'varname': 'NTP Server', 'displaygroup': 'Lab Info'}" value from lab info: [Errno 111] Connection refused
[2015-10-05 13:02:38,012] ERROR /home/admusr/autobot/WebManager/wm/operators.py:3838 - Failed to add py rpc info: [Errno 2] No such file or directory: '/opt/.djangoserverinfo'
[2015-10-05 12:39:22,835] DEBUG /home/admusr/autobot/WebManager/wm/operators.py:749 - no last results get: [Errno 2] No such file or directory: u'/home/admusr/autobot/admin/branches/Wireless_12.2.0_ewortzman/.lastresults'
like image 54
Dan Lowe Avatar answered Nov 13 '22 20:11

Dan Lowe