I have a extremely large text file of size 250 GB that's given to us by a vendor. They also give us a control file that is supposed to have the number of lines in the large file. Sometimes there is a mismatch. How do I count lines in Powershell? I tried this command and it ran for more than half hour and was not done yet.
Get-content C:\test.txt | Measure-Object –Line
(gc C:\test.txt | Measure-object | select count).count
Any help is appreciated Thanks MR
To count the total number of lines in the file in PowerShell, you first need to retrieve the content of the item using Get-Content cmdlet and need to use method Length() to retrieve the total number of lines.
You can use Measure-Object to count objects or count objects with a specified Property. You can also use Measure-Object to calculate the Minimum, Maximum, Sum, StandardDeviation and Average of numeric values. For String objects, you can also use Measure-Object to count the number of lines, words, and characters.
There are two commands you can enter in the Command Prompt to create a dummy file: fsutil file createnew filename size. fsutil file createnew pathfilename size.
The wc command is used to find the number of lines, characters, words, and bytes of a file. To find the number of lines using wc, we add the -l option. This will give us the total number of lines and the name of the file.
If performance matters, avoid the use of cmdlets and the pipeline; use switch -File
:
$count = 0
switch -File C:\test.txt { default { ++$count } }
switch -File
enumerates the lines of the specified file; condition default
matches any line.
To give a sense of the performance difference:
# Create a sample file with 100,000 lines.
1..1e5 > tmp.txt
# Warm up the file cache
foreach ($line in [IO.File]::ReadLines("$pwd/tmp.txt")) { }
(Measure-Command { (Get-Content tmp.txt | Measure-Object).Count }).TotalSeconds
(Measure-Command { $count = 0; switch -File tmp.txt { default { ++$count } } }).TotalSeconds
Sample results from my Windows 10 / PSv5.1 machine:
1.3081307 # Get-Content + Measure-Object
0.1097513 # switch -File
That is, on my machine the switch -File
command was about 12 times faster.
For such a huge file I'd rather go with some C written utility. Install gitbash, it should have wc command:
wc -l yourfile.txt
I tested it on 5GB/50M line file (on HDD), it took about 40s. The best powershell solution took about 2 minutes. You also may check your file, it might have some auto incremental indexes or constant row size.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With