Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Design pattern for parsing binary file data and storing in a database

Does anybody recommend a design pattern for taking a binary data file, parsing parts of it into objects and storing the resultant data into a database?

I think a similar pattern could be used for taking an XML or tab-delimited file and parse it into their representative objects.

A common data structure would include:

(Header) (DataElement1) (DataElement1SubData1) (DataElement1SubData2)(DataElement2) (DataElement2SubData1) (DataElement2SubData2) (EOF)

I think a good design would include a way to change out the parsing definition based on the file type or some defined metadata included in the header. So a Factory Pattern would be part of the overall design for the Parser part.

like image 416
Keith Sirmons Avatar asked Aug 14 '08 20:08

Keith Sirmons


People also ask

What is binary parsing?

The binary parser is driven by a json data structure called a “Profile”. A Profile is simply a data driven description of how structs are laid out and how to parse them. In order to use the parser, one simply provides a profile definition, and a file (or data blob) to parse.

What is data design patterns?

Design patterns in software engineering are repeatable solutions to common software design requirements. A design pattern is an abstraction that does not translate directly into executable code. It is a problem-solving template that can be used as the foundation to design a solution.

How do you process binary data?

You can choose one of two methods for loading the data. 1) Use the commands open file, read from file and close file. 2) Use the URL keyword with the put command, prefixing the file path with "binfile:". Either approach allows you to place binary data into a variable so that it can be processed.


2 Answers

  1. Just write your file parser, using whatever techniques come to mind
  2. Write lots of unit tests for it to make sure all your edge cases are covered

Once you've done this, you will actually have a reasonable idea of the problem/solution.

Right now you just have theories floating around in your head, most of which will turn out to be misguided.

Step 3: Refactor mercilessly. Your aim should be to delete about half of your code

You'll find that your code at the end will either resemble an existing design pattern, or you'll have created a new one. You'll then be qualified to answer this question :-)

like image 99
Orion Edwards Avatar answered Oct 02 '22 15:10

Orion Edwards


I fully agree with Orion Edwards, and it is usually the way I approach the problem; but lately I've been starting to see some patterns(!) to the madness.

For more complex tasks I usually use something like an interpreter (or a strategy) that uses some builder (or factory) to create each part of the data.

For streaming data, the entire parser would look something like an adapter, adapting from a stream object to an object stream (which usually is just a queue).

For your example there would probably be one builder for the complete data structure (from head to EOF) which internally uses builders for the internal data elements (fed by the interpreter). Once the EOF is encountered an object would be emitted.

However, objects created in a switch statement in some factory function is probably the simplest way for many lesser tasks. Also, I like keeping my data-objects immutable as you never know when someone shoves concurrency down your throat :)

like image 36
Henrik Gustafsson Avatar answered Oct 02 '22 16:10

Henrik Gustafsson