Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better Search for a string in all files using C# [closed]

Tags:

c#

.net

file-io

After referring many blogs and articles, I have reached at the following code for searching for a string in all files inside a folder. It is working fine in my tests.

QUESTIONS

  1. Is there a faster approach for this (using C#)?
  2. Is there any scenario that will fail with this code?

Note: I tested with very small files. Also very few number of files.

CODE

static void Main()     {         string sourceFolder = @"C:\Test";         string searchWord = ".class1";          List<string> allFiles = new List<string>();         AddFileNamesToList(sourceFolder, allFiles);         foreach (string fileName in allFiles)         {             string contents = File.ReadAllText(fileName);             if (contents.Contains(searchWord))             {                 Console.WriteLine(fileName);             }         }          Console.WriteLine(" ");         System.Console.ReadKey();     }      public static void AddFileNamesToList(string sourceDir, List<string> allFiles)     {              string[] fileEntries = Directory.GetFiles(sourceDir);             foreach (string fileName in fileEntries)             {                 allFiles.Add(fileName);             }              //Recursion                 string[] subdirectoryEntries = Directory.GetDirectories(sourceDir);             foreach (string item in subdirectoryEntries)             {                 // Avoid "reparse points"                 if ((File.GetAttributes(item) & FileAttributes.ReparsePoint) != FileAttributes.ReparsePoint)                 {                     AddFileNamesToList(item, allFiles);                 }             }      } 

REFERENCE

  1. Using StreamReader to check if a file contains a string
  2. Splitting a String with two criteria
  3. C# detect folder junctions in a path
  4. Detect Symbolic Links, Junction Points, Mount Points and Hard Links
  5. FolderBrowserDialog SelectedPath with reparse points
  6. C# - High Quality Byte Array Conversion of Images
like image 657
LCJ Avatar asked Dec 21 '12 16:12

LCJ


People also ask

How do I search for a string in multiple files?

To search multiple files with the grep command, insert the filenames you want to search, separated with a space character. The terminal prints the name of every file that contains the matching lines, and the actual lines that include the required string of characters. You can append as many filenames as needed.

How do I search for text in all files?

Select Search > Find in Files from the menu. If you like keyboard shortcuts better, use Ctrl-Shift-F to open the search window instead. The find in files configuration window is pretty easy to use as you can ignore most options if you don't require them.

How do I find a word in a string in C?

Search for a character in a string - strchr & strrchr The strchr function returns the first occurrence of a character within a string. The strrchr returns the last occurrence of a character within a string. They return a character pointer to the character found, or NULL pointer if the character is not found.


2 Answers

Instead of File.ReadAllText() better use

File.ReadLines(@"C:\file.txt"); 

It returns IEnumerable (yielded) so you will not have to read the whole file if your string is found before the last line of the text file is reached

like image 136
VladL Avatar answered Sep 19 '22 19:09

VladL


I wrote somthing very similar, a couple of changes I would recommend.

  1. Use Directory.EnumerateDirectories instead of GetDirectories, it returns immediately with a IEnumerable so you don't need to wait for it to finish reading all of the directories before processing.
  2. Use ReadLines instead of ReadAllText, this will only load one line in at a time in memory, this will be a big deal if you hit a large file.
  3. If you are using a new enough version of .NET use Parallel.ForEach, this will allow you to search multiple files at once.
  4. You may not be able to open the file, you need to check for read permissions or add to the manifest that your program requires administrative privileges (you should still check though)

I was creating a binary search tool, here is some snippets of what I wrote to give you a hand

private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e) {     Parallel.ForEach(Directory.EnumerateFiles(_folder, _filter, SearchOption.AllDirectories), Search); }  //_array contains the binary pattern I am searching for. private void Search(string filePath) {     if (Contains(filePath, _array))     {         //filePath points at a match.     } }  private static bool Contains(string path, byte[] search) {     //I am doing ReadAllBytes due to the fact that I am doing a binary search not a text search     //  There are no "Lines" to seperate out on.     var file = File.ReadAllBytes(path);     var result = Parallel.For(0, file.Length - search.Length, (i, loopState) =>         {             if (file[i] == search[0])             {                 byte[] localCache = new byte[search.Length];                 Array.Copy(file, i, localCache, 0, search.Length);                 if (Enumerable.SequenceEqual(localCache, search))                     loopState.Stop();             }         });     return result.IsCompleted == false; } 

This uses two nested parallel loops. This design is terribly inefficient, and could be greatly improved by using the Booyer-Moore search algorithm but I could not find a binary implementation and I did not have the time when I wrote it originally to implement it myself.

like image 38
Scott Chamberlain Avatar answered Sep 22 '22 19:09

Scott Chamberlain