Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# - Search Binary File for a Pattern

What is the best way to search a large binary file for a certain substring in C#?

To provide some specifics, I'm trying to extract the DWARF information from an executable, so I only care about certain parts of the binary file (namely the sections starting with the strings .debug_info, .debug_abbrev, etc.)

I don't see anything obvious in Stream, FileStream, or BinaryReader, so it looks like I'll have to read chunks in and search through the data for the strings myself.

Is there a better way?

like image 484
Clayton Hughes Avatar asked Apr 10 '09 18:04

Clayton Hughes


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is the full name of C?

In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr.

Is C language easy?

C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.

Is C programming hard?

C is more difficult to learn than JavaScript, but it's a valuable skill to have because most programming languages are actually implemented in C. This is because C is a “machine-level” language. So learning it will teach you how a computer works and will actually make learning new languages in the future easier.


1 Answers

There's nothing built into .NET that will do the search for you, so you're going to need to read in the file chunk by chunk and scan for what you want to find.

You can speed up the search in two ways.

Firstly, use bufferred IO and transfer large chunks at a time - don't read byte by byte, read 64KB, 256KB or 1MB chunks.

Secondly, don't do a linear scan for the piece you want - check out the Boyer-Moore (wikipedia link) algorithm for string searches - you can apply this to searching for the DWARF information you want.

like image 97
Bevan Avatar answered Sep 28 '22 04:09

Bevan