Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are fixed-width file formats still in use?

Are there any advantages to a fixed-width file format over something like XML? I realize XML would likely take up more disk space to store the same amount of data but the file could also be compressed. I guess you could also, in theory, read a specific piece of data based on where it is in the file (just grab those bytes). But other than that, what else?

like image 564
Josh M. Avatar asked Oct 05 '11 19:10

Josh M.


3 Answers

I too had the same questions until I realized the power of fixed width. We have a table that has millions of records extracting them into a file as a JSON swelled up the file size to 15GB and 2+hrs. While using the fixed widht brought it down to 6.5GB and 15 minutes.

Extraction and writing a fixed width is faster than JSON.

I tried CSV's too and even here the Fixed width scored better.

like image 101
aadhar sharma Avatar answered Sep 25 '22 01:09

aadhar sharma


When the data is large (Giga/Terra-bytes), fixed width format files can be MUCH more efficient.

Since each record and field has fixed sizes, you can simply seek to the (for example) n-millionth row and read a couple of records from there. You can also memory map the whole file into memory and get rather efficient and easy random access to everything.

XML files aren't a good fit in these cases.

like image 41
Tommy Avatar answered Sep 25 '22 01:09

Tommy


I know this is old, but I deal with both Fixed Width and XML daily. You can pretty much sum it up to:

XML = Readability

Fixed Width = Speed and Low Resource Consumption

XML is largely for readability by a human. I don't care what anyone says about structure and validation. If you're running a system that really doesn't need and should have humans reading the files your passing back and forth, then you're really just adding this as overhead to the amount of time it takes to process the file and to the size of the file, affecting how long the file may take to transfer it contents as well as another impact to processing. All of this will also impact memory usage by the system consuming the XML file. There are advantages however to XML. You can more loosely define your structure. Sometimes its easier if your file and code don't both require a field to be 255 characters long. Only that your code loads that limit period. Another advantage is that XML can/should come with an XML Schema that defines requirements of the XML contents. This helps with having multiple system's that consume a single API. If you can provide your schema to a developer, they can pretty quickly make typed objects that serialize into proper formatted and structured XML.

Fixed Width is for speed and minimal resource consumption. It can be more tedious to setup than XML. Ensuring that all systems know exact positions of "columns" in the Fixed Width file. Often not all systems utilize the same or all columns, so you end up with only a single system that fully understands the Fixed Width contents. This can make it challenging to grow an API or System utilizing your transferred file contents. However because there are no field labels, no tags, nothing but raw data, you can often get a smaller package sent across the wire. Not always true, in some cases, you may have a large number of text fields that common have small amounts of data stored in the fields, but must retain a large column width for one off cases where a paragraph length was input. Now you've got a bunch of white space holding positions in your Fixed Width file and XML may actually reduce your overall package size.

Generally speaking though, XML is for readability. You can't typically just pick up a Fixed Width file or even a CSV file and immediately start grasping at what the data means. Where as well labeled XML files, you can.

There's a number of advantages and disadvantages that I've not gone into, but this is where I see the real meat and potatoes of the differences.

like image 28
Rob K. Avatar answered Sep 23 '22 01:09

Rob K.