Our daily feed file averages 2 GB in size. These files get archived to a single zip file at the end of each month and stored in a network share. From time to time, I have a need to search for certain records in those files. I do this by connecting by remote desktop to the shared server, unzip the files to a temp folder, run grep (or PowerShell) search, and then delete the temp folder. Now, because our server is running low in disk space, it is no longer recommeded to unzip them all to a temp folder. What is an efficient way to do a regex search on those zipped files with minimum impact on disk or network resources?
Unfortunately, grep doesn't work on compressed files. To overcome this, people usually advise to first uncompress the file(s), and then grep your text, after that finally re-compress your file(s)… You don't need to uncompress them in the first place. You can use zgrep on compressed or gzipped files.
The PowerShell Community Extensions (PSCX) include Read-Archive
and Expand-Archive
cmdlets, but don't (yet?) include a navigation provider which would make what you want very simple. That said, you could use Read-Archive
and Expand-Archive
. Something like this untested bit
Read-Archive -Path foo.zip -Format Zip | ` Where-Object { $_.Name -like "*.txt" } | ` Expand-Archive -PassThru | select-string "myRegex"
would let you search without extracting the entire archive.
zgrep on Linux. If you're on Windows, you can download GnuWin which contains a Windows port of zgrep.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With