I am trying to regex out a date from a text file. This is the content:
Storage Manager Command Line Administrative Interface - Version 7, Release 1, Level 1.4 (c) Copyright by Corporation and other(s) 1990, 2015. All Rights Reserved.
Session established with server TSERVER: Windows Server Version 7, Release 1, Level 5.200 Server date/time: 11/22/2016 15:30:00 Last access: 11/22/2016 15:25:00
ANS8000I Server command.
I need to extract the date/time after server date/time. I have written this regex:
/([0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4} [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2})/
This works perfectly in regex101. See example on https://regex101.com/r/MB7yB4/1 However, in Powershell it reacts different.
$var -match "([0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4} [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2})"
gives
Server date/time: 11/22/2016 16:30:00 Last access: 11/22/2016 15:37:19
and
$var -match "([0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4} [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2})"
gives nothing.
I am not sure why the match is not the same.
Thanks for any help!
The -match
operator returns a boolean value showing if a match was found or not. Also, it sets the $matches
variable with the match data (the whole match and capture group values). You just need to access the whole match:
if($var -match '[0-9]{1,2}/[0-9]{1,2}/[0-9]{4} [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2}') { $matches[0] }
See Using -match
and the $matches
variable in PowerShell.
Note that there is no need escaping /
synmbol in Powershell regexps, since this character is not special, and regex delimiters (those outer /.../
as in JS, PHP regexp) are not used when defining a regular expression in Powershell.
To complement Wiktor Stribiżew's helpful answer, which contains many useful pointers and an effective solution, but doesn't explain the behavior of the -match
operator with array input:
-match
operator changes if the LHS is an array of strings: instead of a Boolean, the matching array elements are returned and the $Matches
variable is not populated. Effectively, -match
then performs array filtering.
$var
with just Get-Content
, which returns the lines as a string array rather than a single string. In PSv3+, adding switch -Raw
reads the entire file as a single string.$Matches
hashtable in order to access information about what the most recent use of -match
captured: $Matches[0]
contains what the regex captured as a whole, $Matches[1]
what the first (unnamed) capture group captured ($Matches[2]
for the 2nd one, ...), and $Matches['<name>']
for named capture groups, as demonstrated in LotPing's helpful answer. ($Matches.0
is just an alternative syntax for $Matches[0]
, for instance).'...'
) to define regular expressions, so that PowerShell's own string interpolation that is applied to double-quoted strings ("..."
) doesn't get in the way.When it comes to substring extraction using a regular expression, using -replace
often allows a more concise solution:
$var -join "`n" -replace '(?s).*?(\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2}:\d{1,2}).*', '$1'
The extra -join "`n"
step is needed to reassemble the array of lines in $var
into a single string to pass as input to -replace
.
The explanation below shows how to use Get-Content -Raw
to read the entire file as a single string to begin with.
Explanation:
# Read the text file as a *single* string, using -Raw.
# Note: Without -Raw, you get an *array* of strings representing
# the individual lines.
$var = Get-Content -Raw file.txt
# Define the regex that matches the *entire* input,
# with a single capture group capturing the substring of interest.
# The regex:
# - is prefixed with an inline-option expression, (?s), which ensures
# that . also matches a newline.
# - starts with .*? a non-greedy expression matching any
# sequence of characters at the start of the input,
# - followed by the original capture-group regex (though without escaping of / as \/,
# because that is not necessary in PowerShell, and \d used instead of [0-9])
# - ends with .*, a greedy expression that matches everything through the
# end of the input.
$re = '(?s).*?(\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2}:\d{1,2}).*'
# Using -replace, we replace the entire input string - by virtue
# of the overall regex matching the entire string - with only
# what the capture group captured ($1).
# The net effect is that only the capture group value is output.
# With the sample input, this outputs '1/22/2016 15:30:00', the first
# timestamp encountered.
$var -replace $re, '$1'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With