Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PowerShell: Select line preceding a match -- Select-String -Context issue when using input string variable

I need return a line preceeding a match on a multi-line string variable.

It seems when using a string variable for the input Select-String considers the entire string as having matched. As such the Context properties are "outside" either end of the string and are null.

Consider the below example:

$teststring = @"
line1
line2
line3
line4
line5
"@

Write-Host "Line Count:" ($teststring | Measure-Object -Line).Lines #verify PowerShell does regard input as a multi-line string (it does)

Select-String -Pattern "line3" -InputObject $teststring -AllMatches -Context 1,0 | % {
$_.Matches.Value #this prints the exact match
$_.Context #output shows all context properties to be empty 
$_.Context.PreContext[0] #this would ideally output first line before the match
$_.Context.PreContext[0] -eq $null #but instead is null
}

Am I misunderstanding something here?

What is the best way to return "line2" when matching for "line3"?

Thanks!

Edit: Additional requirements I neglected to state: Needs to provide the line above ALL matched lines for a string of indeterminate length. EG when searching the below for "line3" I need to return "line2" and "line5".

line1
line2
line3
line4
line5
line3
line6
like image 985
Joenarr Bronarsson Avatar asked Feb 05 '23 05:02

Joenarr Bronarsson


2 Answers

Select-String operates on arrays of input, so rather than a single, multiline string you must provide an array of lines for -Context and -AllMatches to work as intended:

$teststring = @"
line1
line2
line3
line4
line5
line3
line6
"@

$teststring -split '\r?\n' | Select-String -Pattern "line3" -AllMatches -Context 1,0 | % {
  "line before:  " + $_.Context.PreContext[0]
  "matched part: " + $_.Matches.Value  # Prints the what the pattern matched
}

This yields:

line before:  line2
matched part: line3
line before:  line5
matched part: line3
  • $teststring -split '\r?\n' splits the multi-line string into an array of lines:

    • Note: What line-break sequences your here-document uses (LF-only vs. CRLF) depends on the enclosing script file; regex \r?\n handles either style.
  • Note that it is crucial to use the pipeline to provide Select-String's input; if you used -InputObject, the array would be coerced back to a single string.


Select-String is convenient, but slow.
Especially for a single string already in memory, a solution using the .NET Framework's [Regex]::Matches() method will perform much better, though it is more complex.

Note that PowerShell's own -match and -replace operators are built on the same .NET class, but do not expose all of its functionality; -match - which does report capture groups in the automatic $Matches variable - is not an option here, because it only ever returns 1 match.

The following is essentially the same approach as in mjolinor's answer answer, but with several problems corrected[1].

# Note: The sample string is defined so that it contains LF-only (\n)
#       line breaks, merely to simplify the regex below for illustration.
#       If your script file use LF-only line breaks, the 
#       `-replace '\r?\n', "`n" call isn't needed.
$teststring = @"
line1
line2
line3
line4
line5
line3
line6
"@ -replace '\r?\n', "`n" 

[Regex]::Matches($teststring, '(?:^|(.*)\n).*(line3)') | ForEach-Object { 
  "line before:  " + $_.Groups[1].Value
  "matched part: " + $_.Groups[2].Value
}
  • Regex (?:^|(.*)\n).*(line3) uses 2 capture groups ((...)) to capture both the (matching part of) the line to match and the line before ((?:...) is an auxiliary non-capturing group that is needed for precedence):

    • (?:^|(.*)\n) matches either the very start of the string (^) or (|) any - possibly empty - sequence of non-newline characters (.*) followed by a newline (\n); this ensures that the line to match is also found when there is no preceding line (i.e., of the line to match is the first one).
    • (line3) is the group defining the line to match; it is preceded by .* to match the behavior in the question, where pattern line3 is found even it is only part of a line.
      • If you want only full lines to match, use the following regex instead:
        (?:^|(.*)\n)(line3)(?:\n|$)
  • [Regex]::Matches() finds all matches and returns them as a collection of System.Text.RegularExpressions.Match objects, which the ForEach-Object cmdlet call can then operate on to extract the capture-group matches ($_.Groups[<n>].Value).


[1] As of this writing:
- There is no need to match twice - the enclosing if ($teststring -match $pattern) { ... } is unnecessary.
- Inline option (?m) is not needed, because . does not match newlines by default.
- (.+?) captures only nonempty lines (and ?, the non-greedy quantifier, is not needed).
- If the line of interest is the first line - i.e., if there's no line before, it won't be matched.

like image 176
mklement0 Avatar answered Feb 07 '23 05:02

mklement0


You can use a multi-line regex, with the -match operator:

$teststring = @"
line1
line2
line3
line4
line5
line3
line6
"@

$pattern = 
@'
(?m)
(.+?)
line3
'@


if ($teststring -match $pattern)
  { [Regex]::Matches($teststring,$pattern) |
    foreach {$_.groups[1].value} }
like image 37
mjolinor Avatar answered Feb 07 '23 05:02

mjolinor