Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary file format with 1000s of records in C#

I would like to have an array model objects to be serialized to a binary stream. The model class will mainly have string and integer properties.

I believe that I can mark the class as [Serializable] and use the binary formattter, however I'd be interested to know whether you think this is the best way bearing in mind that my priority is to have as smaller file as possible for transfer over a low bandwidth connection (I can zip/unzip the file too).

The file could have 1000s of records, so ideally I'd like to be able to append to disk and read from disk record by record, without ever having to have the entire file in memory at once.

So my priorities are: small file size and efficient memory use.

Maybe there is a pre-written framework for this? It seems easy to do with XML and CSV files! Hopefully it is with a custom binary format too.

thanks

like image 957
krisdyson Avatar asked Mar 18 '11 15:03

krisdyson


People also ask

What are binary files in C?

A binary file is a file whose content is in a binary format consisting of a series of sequential bytes, each of which is eight bits in length. The content must be interpreted by a program or a hardware processor that understands in advance exactly how that content is formatted and how to read the data.

What are examples of binary format files?

Executable files, compiled programs, SAS and SPSS system files, spreadsheets, compressed files, and graphic (image) files are all examples of binary files.

Which data formats are usually stored in binary files?

A binary file is one that does not contain text. It is used to store data in the form of bytes, which are typically interpreted as something other than textual characters. These files usually contain instructions in their headers to determine how to read the data stored in them.

What is binary format of data?

A binary format is a format in which file information is stored in the form of ones and zeros, or in some other binary (two-state) sequence. This type of format is often used for executable files and numeric information in computer programming and memory.


2 Answers

I suggest protobuf.net which is very efficient.

Having said that, this will not be able to handle serialising/deserialsing individual objects in your collection. That part you need to implement yourself.

  • One solution is to: Store objects as individual files in a folder. File name will contain a reference so that based on name, you can find the object you need.

  • Another is to have one file but keep an index file which keeps a list of all objects and their positions in the file. This is a lot more complicated as when you are saving an object which is in the middle of the file, you have to move all other addresses, and perhaps a b-tree is more effective.

like image 170
Aliostad Avatar answered Oct 02 '22 03:10

Aliostad


Another option is to just serialize to a fixed-width text file format and let ZIP handle the compression. Fixed-width means you can easily use a MemoryMappedFile to walk through each record without needing to load the entire file into memory.

like image 31
Chris Haas Avatar answered Oct 02 '22 02:10

Chris Haas