Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Powershell Count lines extremely large file

I have a extremely large text file of size 250 GB that's given to us by a vendor. They also give us a control file that is supposed to have the number of lines in the large file. Sometimes there is a mismatch. How do I count lines in Powershell? I tried this command and it ran for more than half hour and was not done yet.

Get-content C:\test.txt | Measure-Object –Line

(gc C:\test.txt | Measure-object | select count).count

Any help is appreciated Thanks MR

like image 411
user2726975 Avatar asked Feb 26 '19 20:02

user2726975


People also ask

How do I count the number of lines in a file in PowerShell?

To count the total number of lines in the file in PowerShell, you first need to retrieve the content of the item using Get-Content cmdlet and need to use method Length() to retrieve the total number of lines.

What PowerShell command will allow for counting lines in a file averaging numbers and summing numbers?

You can use Measure-Object to count objects or count objects with a specified Property. You can also use Measure-Object to calculate the Minimum, Maximum, Sum, StandardDeviation and Average of numeric values. For String objects, you can also use Measure-Object to count the number of lines, words, and characters.

How do I create a 10gb dummy?

There are two commands you can enter in the Command Prompt to create a dummy file: fsutil file createnew filename size. fsutil file createnew pathfilename size.

How do I count the number of lines in a file?

The wc command is used to find the number of lines, characters, words, and bytes of a file. To find the number of lines using wc, we add the -l option. This will give us the total number of lines and the name of the file.


2 Answers

If performance matters, avoid the use of cmdlets and the pipeline; use switch -File:

$count = 0
switch -File C:\test.txt { default { ++$count } }

switch -File enumerates the lines of the specified file; condition default matches any line.


To give a sense of the performance difference:

# Create a sample file with 100,000 lines.
1..1e5 > tmp.txt
# Warm up the file cache
foreach ($line in [IO.File]::ReadLines("$pwd/tmp.txt")) { }

(Measure-Command { (Get-Content tmp.txt | Measure-Object).Count }).TotalSeconds

(Measure-Command { $count = 0; switch -File tmp.txt { default { ++$count } } }).TotalSeconds

Sample results from my Windows 10 / PSv5.1 machine:

1.3081307  # Get-Content + Measure-Object
0.1097513  # switch -File

That is, on my machine the switch -File command was about 12 times faster.

like image 76
mklement0 Avatar answered Oct 01 '22 00:10

mklement0


For such a huge file I'd rather go with some C written utility. Install gitbash, it should have wc command:

wc -l yourfile.txt

I tested it on 5GB/50M line file (on HDD), it took about 40s. The best powershell solution took about 2 minutes. You also may check your file, it might have some auto incremental indexes or constant row size.

like image 33
Mike Twc Avatar answered Oct 01 '22 00:10

Mike Twc