I'm trying to write a PowerShell script to do some analysis on production log files.
I need to filter roughly 2.5 million lines of text by roughly 50-100 values - potentially 250,000,000 iterations if O(nm).
I've tried Get-Content | Select-String but this seems incredibly slow.
Is there any way to approach this without iterating over every line once for each value?
So, the log files look a bit like this (datetime : process_id : log_level : message)
2016-01-30 14:01:22.349 [ 27] INFO XXX YYY XXXFX
2016-01-30 14:01:28.146 [ 16] INFO XXXD YY Z YYY XXXX
2016-01-30 14:01:28.162 [ 16] DEBUG YY XXXXX YY XX P YYY
2016-01-30 14:01:28.165 [ 16] DEBUG YY XXXXX YY XX YYY
2016-01-30 14:01:28.167 [ 16] DEBUG YY XXXXX YY XX YYY
2016-01-30 14:01:28.912 [ 27] INFO XXX YY XXGXXX YYYYYY YY XX
and I may be looking for the values D, F, G and Z.
The values could be strings of binary digits, hexadecimal digits, combinations of the two, strings of regular text and punctuation, or pipe-delimited values.
Rules of thumb:
StreamReader is faster than Get-Content is faster than Import-Csv.You probably want something like this if it's sufficient to check whether your log lines contain any of the given strings:
$reader = [IO.StreamReader]'C:\path\to\your.log'
$filters = 'foo', 'bar', ...
while ($reader.Peek() -ge 0) {
$line = $reader.ReadLine()
if ($filters | Where-Object {$line.Contains($_)}) {
$line
}
}
$reader.Close()
$reader.Dispose()
If you want to use a StreamWriter instead of just echoing the output simply adjust the code like this:
$reader = [IO.StreamReader]'C:\path\to\your.log'
$writer = [IO.StreamWriter]'C:\path\to\output.txt'
$filters = 'foo', 'bar', ...
while ($reader.Peek() -ge 0) {
$line = $reader.ReadLine()
if ($filters | Where-Object {$line.Contains($_)}) {
$writer.WriteLine($line)
}
}
$reader.Close(); $reader.Dispose()
$writer.Close(); $writer.Dispose()
Depending on the structure of your log lines as well as the filter values and how they need to be applied the filter logic may need adjustments. You need to actually show the log format and filter examples for that, though.
I would try a StreamReader + StreamWriter to speed up reading/writing as Get-Content is slow for big files. Also, I would try to make one regex (word OR word OR word etc.) to avoid hundreds of iterations. Ex:
$words = "foo","bar","donkey"
#Create regex-pattern (usually faster to match)
$regex = ($words | % { [regex]::Escape($_) }) -join '|'
$reader = New-Object System.IO.StreamReader -ArgumentList "c:\myinputfile.txt"
$writer = New-Object System.IO.StreamWriter -ArgumentList "c:\myOUTputfile.txt"
while (($line = $reader.ReadLine()) -ne $null) {
if($line -match $regex) { $writer.WriteLine($line) }
}
#Close writer
$writer.Close()
$writer.Dispose()
#Close reader
$reader.Close()
$reader.Dispose()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With