Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding pattern in large binary file using C or C++?

I have a ~700 MB binary file (non-text data); what I would like to do is search for a specific pattern of bytes that occurs in random locations throughout the file. e.g. 0x? 0x? 0x55 0x? 0x? 0x55 0x? 0x? 0x55 0x? 0x? 0x55 and so on for 50 or so bytes in sequence. The pattern I'd be searching for would be a sequence two random bytes with 0x55 occurring every two bytes.

That is, search for tables stored in the file with 0x55 being the delimiter, and then save the data contained in the tables or otherwise manipulate it.

Would the best option be simply going through every individual byte one at a time, and then looking ahead two bytes to see if the value is 0x55, and if it is, then looking ahead again and again to confirm that a table exists in that location?

Load the whole thing? fseek? Buffer chunks, searching those one byte at a time?

What would be the best way of looking through this large file, and finding the pattern, using C or C++?

like image 636
Kyle Lowry Avatar asked Oct 11 '22 16:10

Kyle Lowry


1 Answers

This sounds like a great job for a regular expression matcher or a deterministic finite automaton. These are high-power tools designed to do just what you're asking, and if you have them at your disposal you shouldn't have much trouble doing this sort of search. In C++, consider looking into the Boost.Regex libraries, which should have all the functionality you need to knock this problem down.

like image 179
templatetypedef Avatar answered Oct 15 '22 10:10

templatetypedef