Need help improving performance of PowerShell delimited-text parsing script

Question

I have the need to parse through a large pipe-delimited file to count the number of records whose 5th column meets and doesn't meet my criteria.

PS C:	emp> gc .\items.txt -readcount 1000 | `
  ? { $_ -notlike "HEAD" } | `
  % { foreach ($s in $_) { $s.split("|")[4] } } | `
  group -property {$_ -ge 256} -noelement | `
  ft –autosize

This command does what I want, returning output like this:

  Count Name
  ----- ----
1129339 True
2013703 False

However, for a 500 MB test file, this command takes about 5.5 minutes to run as measured by Measure-Command. A typical file is over 2 GB, where waiting 20+ minutes is undesirably long.

Do you see a way to improve the performance of this command?

For example, is there a way to determine an optimum value for Get-Content's ReadCount? Without it, it takes 8.8 minutes to complete the same file.

Gisli · Accepted Answer

Have you tried StreamReader? I think that Get-Content loads the whole file into memory before it does anything with it.

StreamReader class

Need help improving performance of PowerShell delimited-text parsing script

Tags:

performance

powershell

csv

neontapir

1 Answers

Gisli

Recent Activity

Donate For Us

Need help improving performance of PowerShell delimited-text parsing script

Tags:

performance

powershell

csv

neontapir

1 Answers

Gisli

Related questions

Recent Activity

Donate For Us