Is there a standard format for describing a flat file?

Tags:

Is there a standard or open format which can be used to describe the formating of a flat file. My company integrates many different customer file formats. With an XML file it's easy to get or create an XSD to describe the XML file format. I'm looking for something similar to describe a flat file format (fixed width, delimited etc). Stylus Studio uses a proprietary .conv format to do this. That .conv format can be used at runtime to transform an arbitrary flat file to an XML file. I was just wondering if there was any more open or standards based method for doing the same thing.

I'm looking for one method of describing a variety of flat file formats whether they are fixed width or delimited, so CSV is not an answer to this question.

228

asked Oct 14 '09 18:10

Stimy

4 Answers

XFlat: http://www.infoloom.com/gcaconfs/WEB/philadelphia99/lyons.HTM#N29 http://www.unidex.com/overview.htm

For complex cases (e.g. log files) you may consider a lexical parser.

143

answered Oct 09 '22 02:10

queen3

About selecting existing flat file formats: There is the Comma-separated values (CSV) format. Or, more generally, DSV. But these are not "fixed-width", since there's a delimiter character (such as a comma) that separates individual cells. Note that though CSV is standardized, not everybody adheres to the standard. Also, CSV may be to simple for your purposes, since it doesn't allow a rich document structure.

In that respect, the standardized and only slightly more complex (but thus more useful) formats JSON and YAML are a better choice. Both are supported out of the box by plenty of languages.

Your best bet is to have a look at all languages listed as non-binary in this overview and then determine which works best for you.

About describing flat file formats: This could be very easy or difficult, depending on the format. Though in most cases easier solutions exist, one way that will work in general is to view the file format as a formal grammar, and write a lexer/parser for it. But I admit, that's quite^† heavy machinery.

If you're lucky, a couple of advanced regular expressions may do the trick. Most formats will not lend themselves for that however.^‡ If you plan on writing a lexer/parser yourself, I can advise PLY (Python Lex-Yacc). But many other solutions exists, in many different languages, a lot of them more convenient than the old-school Lex & Yacc. For more, see What parser generator do you recommend?

^†: Yes, that may be an understatement.
^‡: Even properly describing the email address format is not trivial.

answered Oct 09 '22 02:10

Stephan202

COBOL (whether you like it or not) has a standard format for describing fixed-width record formats in files.

Other file formats, however, are somewhat simpler to describe. A CSV file, for example, is just a list of strings. Often the first row of a CSV file is the column names -- that's the description.

There are examples of using JSON to formulate metadata for text files. This can be applied to JSON files, CSV files and fixed-format files.

Look at http://www.projectzero.org/sMash/1.1.x/docs/zero.devguide.doc/zero.resource/declaration.html

This is IBM's sMash (Project Zero) using JSON to encode metadata. You can easily apply this to flat files.

answered Oct 09 '22 03:10

S.Lott

At the end of the day, you will probably have to define your own file standard that caters specifically to your storage needs. What I suggest is using xml, YAML or JSON as your internal container for all of the file types you receive. On top of this, you will have to implement some extra validation logic to maintain meta-data such as the column sizes of the fixed width files (for importing from and exporting to fixed width). Alternatively, you can store or link a set of metadata to each file you convert to the internal format.

There may be a standard out there, but it's too hard to create 'one size fits all' solutions for these problems. There are entity relationship management tools out there (Talend, others) that make creating these mappings easier, but you will still need to spend a lot of time maintaining file format definitions and rules.

As for enforcing column width, xml might be the best solution as you can describe the formats using xml schemas (with the length restriction). For YAML or JSON, you may have to write your own logic for this, although I'm sure someone else has come up with a solution.

See XML vs comma delimited text files for further reference.

answered Oct 09 '22 02:10

Dana the Sane

Related questions
                            
                                R: as.numeric function not returning correct # from data.frame [duplicate]
                            
                                How to apply VS2010 web.config transformation to an element with a namespace attribute?
                            
                                D3 Sankey chart using circle node instead of rectangle node
                            
                                List filtering and transformation
                            
                                JavaFX: Apply perspective transformation on a node given a perspective transformed pane
                            
                                Python Library - json to json transformations
                            
                                Finding centre of rotation for a set of points [closed]
                            
                                QGraphicsView and QGraphicsItem: don´t scale item when scaling the view rect
                            
                                SVG Transformations in JavaScript
                            
                                Deserialize nested fields in marshmallow
                            
                                Handling change in newlines by XML transformation for CDATA from Java 8 to Java 11
                            
                                Combine constraints and data transformers
                            
                                PHP - Perform parsing rules on nested array
                            
                                Pass parameter to XSLT stylesheet
                            
                                How can I reliably rotate an image around a point?
                            
                                HOW TO use Homography to transform pictures in OpenCV?
                            
                                Velocity Template engine - key-value-map
                            
                                Find and replace entire HTML nodes with Nokogiri
                            
                                how to perform coordinates affine transformation using python? part 2
                            
                                Moving a shadow around a circle

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a standard format for describing a flat file?

Tags:

csv

transformation

flat-file

Stimy

People also ask

4 Answers

queen3

Stephan202

S.Lott

Dana the Sane

Recent Activity

Donate For Us