Parsing a multi-line fixed format text file

Tags:

I'm trying to parse some data in a fixed format text file where each "record" is spread over a number of lines, as so ...

 MAILBOX: 10013      Created: 01/20/09  4:39 pm
    MSGS: 0         UNPLAYED: 0           URGENT: 0          RECEIPT: 0
  LCOS: RBC Standard    : 20            FCOS: RBC Standard      : 20 
  GCOS: Default GCOS 1  : 1             NCOS: Default           : 1 
  TCOS: Default TCOS 1  : 1             RCOS:                   : 1 
BAD LOGS: 0         LAST LOG: NEVER                             MINS:      0.0
  PASSWD: Y            TUTOR: N              DAY: M            NIGHT: M       
    NAME:                                   CODE: 
   EXTEN: 10013                            INDEX: 0
ATTEN DN:                                  INDEX: 0         
DISTRIBUTION LISTS WITH CHANGE RIGHTS:
    all
DISTRIBUTION LISTS WITH REVIEW RIGHTS:
    all

I've used File Helpers before for single line records, and it's been very useful. Checking it's documentation, it does have a MultiRecordEngine feature, but this is going to mean ...

a class for each line ... not a problem
calculating the exact size of each fixed format field ... painful and open to error
logic to check each line

And a further wrinkle I found was the fixed format is actually not fixed, i.e. there are different format lines depending on the target record, so some have 21 lines, some 22, 23, 24, etc.

I have found a Java flat file parsing library, FFP, but I'm a .NET, C#, PowerShell coder

Are there better ways of handling this sort of parsing ?

287

asked Jan 30 '12 08:01

SteveC

2 Answers

What you need is a lexer. Your record is too big to use a single Regex to parse, so you have to write one regex for each line, and a state machine to validate that the lines follows in the right order.

Or you can use a general purpose lexer/parser to generate the code for you. Wikipedia has long list. The Gold parser looks like a good candidate.

I would not try to do the lexing/parsing in PowerShell. I would rather write the code as C# or F# and use the assembly from PowerShell.

Edit: I've just looked at FileHelpers library. You could create a Multirecord Engine with a .NET Type that matches each line in you source record. All you have to do then is parse the result array for valid order and create objects.

answered Oct 11 '22 14:10

Huusom

I've done similar in powershell, and found that using a regex in a here-string is much easier to work with:

http://mjolinor.wordpress.com/2012/01/05/powershell-multiline-regex-matching/

answered Oct 11 '22 13:10

mjolinor

Related questions
                            
                                Multipage PDF document from predefined template
                            
                                DataGridView combobox cell event in c#
                            
                                Intersection of two sets (Lists) of data
                            
                                Getting Actual Size of UserControl before rendering
                            
                                How to create instance of class in XAML?
                            
                                Magento API v2 and C# - set custom attributes whilst adding product
                            
                                fastest way to search huge list of big texts
                            
                                Alternative to Directory.CreateDirectory(path) supporting long paths
                            
                                Integrating PayPal in C#/.NET Solution using WSDL (SOAP)
                            
                                What is a fast, memory efficient way to pass data between threads in C#?
                            
                                Is it possible to configure a WCF service using castle windsor fluent configuration without config or svc files?
                            
                                CacheItem regionName property responsibility/uses?
                            
                                How to apply Graphics scale and translate to the TextRenderer
                            
                                Why do we need to add a reference to an assembly, from which a class library project inherits, into a consumer project?
                            
                                How to check if private/public key pair match using (.NET / BouncyCastle)?
                            
                                Is [ComImport] considered P/Invoke?
                            
                                Asp.NET Real Time Game
                            
                                yield always gets called
                            
                                Is the DllImport attribute always loading the unmanaged DLL
                            
                                How can I bring google-like recrawling in my application(web or console)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parsing a multi-line fixed format text file

Tags:

c#

powershell

parsing

SteveC

People also ask

2 Answers

Huusom

mjolinor

Recent Activity

Donate For Us