Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract lines matching a pattern from all text files in a folder to a single output file

I am trying to extract each line starting with "%%" in all files in a folder and then copy those lines to a separate text file. Currently using this code in PowerShell code, but I am not getting any results.

$files = Get-ChildItem "folder" -Filter *.txt
foreach ($file in $files)
{
if ($_ -like "*%%*")
{
Set-Content "Output.txt" 
}  
}
like image 719
Jabir Jamal Avatar asked Dec 09 '16 03:12

Jabir Jamal


People also ask

Which command is used to extract specific lines records from a file?

The cut command offers many ways to extract portions of each line from a text file. It's similar to awk in some ways, but it has its own advantages and quirks. One surprisingly easy command for grabbing a portion of every line in a text file on a Linux system is cut.


2 Answers

I think that mklement0's suggestion to use Select-String is the way to go. Adding to his answer, you can pipe the output of Get-ChildItem into the Select-String so that the entire process becomes a Powershell one liner.

Something like this:

Get-ChildItem "folder" -Filter *.txt | Select-String -Pattern '^%%' | Select -ExpandProperty line | Set-Content "Output.txt"
like image 66
Dave Sexton Avatar answered Oct 21 '22 23:10

Dave Sexton


The Select-String cmdlet offers a much simpler solution (PSv3+ syntax):

(Select-String -Path folder\*.txt -Pattern '^%%').Line | Set-Content Output.txt
  • Select-String accepts a filename/path pattern via its -Path parameter, so, in this simple case, there is no need for Get-ChildItem.

    • If, by contrast, you input file selection is recursive or uses more complex criteria, you can pipe Get-ChildItem's output to Select-String, as demonstrated in Dave Sexton's helpful answer.
    • Note that, according to the docs, Select-String by default assumes that the input files are UTF-8-encoded, but you can change that with the -Encoding parameter; also consider the output encoding discussed below.
  • Select-String's -Pattern parameter expects a regular expression rather than a wildcard expression.
    ^%% only matches literal %% at the start (^) of a line.

  • Select-String outputs [Microsoft.PowerShell.Commands.MatchInfo] objects that contain information about each match; each object's .Line property contains the full text of an input line that matched.

  • Set-Content Output.txt sends all matching lines to single output file Output.txt

    • Set-Content uses the system's legacy Windows codepage (an 8-bit single-byte encoding - even though the documentation mistakenly claims that ASCII files are produced).
      If you want to control the output encoding explicitly, use the -Encoding parameter; e.g., ... | Set-Content Output.txt -Encoding Utf8.
    • By contrast, >, the output redirection operator always creates UTF-16LE files (an encoding PowerShell calls Unicode), as does Out-File by default (which can be changed with -Encoding).
      Also note that > / Out-File apply PowerShell's default formatting to the input objects to obtain the string representation to write to the output file, whereas Set-Content treats the input as strings (calls .ToString() on input objects, if necessary). In the case at hand, since all input objects are already strings, there is no difference (except for the character encoding, potentially).

As for what you've tried:

  • $_ inside your foreach ($file in $files) refers to a file (a [System.IO.FileInfo] object), so you're effectively evaluating your wildcard expression *%%* against the input file's name rather than its contents.

  • Aside from that, wildcard pattern *%%* will match %% anywhere in the input string, not just at its start (you'd have to use %%* instead).

  • The Set-Content "Output.txt" call is missing input, because it is not part of a pipeline and, in the absence of pipeline input, no -Value argument was passed.

    • Even if you did provide input, however, output file Output.txt would get rewritten as a whole in each iteration of your foreach loop.
like image 31
mklement0 Avatar answered Oct 21 '22 23:10

mklement0