Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to replace Get-Content, ForEach-Object string -match with Select-String cmdlet?

I have a fixed width file with records in a format as follows

DDEDM2018890                                                                 19960730015000010000
DDETPL015000                                                                 20150515015005010000
DDETPL015010                                                                 20150515015003010000
DDETPL015020                                                                 20150515015002010000
DDETPL015030                                                                 20150515015005010000
DDETPL015040                                                                 20150515015000010000

the first 3 characters identify the record type, in the above example all records are of type DDE but there are also lines of a different type in the file.

the following regular expression with named capture groups parses the relevant information from each record for my purpose (notice it also filters down to DDE record types:

DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})

play with this regex on this excellent online parser

I have written a script that uses the Get-Content, ForEach-Object and Select-Object cmdlets to convert the fixed width file into a csv file.

I wonder if I could replace the Get-Content and ForEach-Object cmdlets by a single Select-String cmdlet?

#this powershell script reads fixed width file and generates a csv file of the relevant & converted values

#Prepare HashSet object for Select-Object to convert CategoryCode and append with CategoryId
$Category = @{
    Name = "Category"
    Expression = {
        $cat = switch($_.CategoryCode) 
        {
            "50"{"A"}
            "54"{"C"}
            "60"{"F"}
            "66"{"I"}
            "74"{"M"}
            "88"{"T"}
        } 
        $cat+$_.CategoryId
    }
}

gc "C:\Path\To\File.txt" | % { 
        if($_ -match "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3}).*$")
        {
            #$matches is a hashset of named capture groups, convert to object to allow Select-Object to handle hashset elements as object properties
            [PSCustomObject]$matches
        }
    } | select Database, $Category, Length #| export-csv "AnalysisLengths.csv" -NoTypeInformation

Before I finalized the script, I was trying to use the Select-String cmdlet but could not figure out how to use it, I believe it can achieve the same result in a more eloquent way... this is what I had:

##Could this be completed with just the Select-String commandlet instead of Get-Content+ForEach+Select-Object?
Select-String -Path "C:\Path\To\File.txt" `
    -Pattern "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})" `
    | Select-Object -ExpandProperty Matches 

Using -ExpandProperty should convert the Microsoft.PowerShell.Commands.MatchInfo Matches property into the actual System.Text.RegularExpressions.Match objects for each line...

see also Powershell Select-Object vs ForEach on Select-String results

like image 209
Vincent De Smet Avatar asked May 23 '26 18:05

Vincent De Smet


1 Answers

Here is one way (I'am not so proud of it)

Select-String -Path "C:\Path\To\File.txt" -Pattern "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})" | %{New-Object -TypeName PSObject -Property @{Database=$_.matches.groups[1];CategoryCode=$_.matches.groups[2];CategoryId=$_.matches.groups[3];Length=$_.matches.groups[4]}} | export-csv "C:\Path\To\File.csv"
like image 160
JPBlanc Avatar answered May 25 '26 09:05

JPBlanc



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!