I'm trying to parse some data in a fixed format text file where each "record" is spread over a number of lines, as so ...
MAILBOX: 10013 Created: 01/20/09 4:39 pm
MSGS: 0 UNPLAYED: 0 URGENT: 0 RECEIPT: 0
LCOS: RBC Standard : 20 FCOS: RBC Standard : 20
GCOS: Default GCOS 1 : 1 NCOS: Default : 1
TCOS: Default TCOS 1 : 1 RCOS: : 1
BAD LOGS: 0 LAST LOG: NEVER MINS: 0.0
PASSWD: Y TUTOR: N DAY: M NIGHT: M
NAME: CODE:
EXTEN: 10013 INDEX: 0
ATTEN DN: INDEX: 0
DISTRIBUTION LISTS WITH CHANGE RIGHTS:
all
DISTRIBUTION LISTS WITH REVIEW RIGHTS:
all
I've used File Helpers before for single line records, and it's been very useful. Checking it's documentation, it does have a MultiRecordEngine
feature, but this is going to mean ...
And a further wrinkle I found was the fixed format is actually not fixed, i.e. there are different format lines depending on the target record, so some have 21 lines, some 22, 23, 24, etc.
I have found a Java flat file parsing library, FFP, but I'm a .NET, C#, PowerShell coder
Are there better ways of handling this sort of parsing ?
[Google Dictionary]File parsing in computer language means to give a meaning to the characters of a text file as per the formal grammar.
Fixed-length format files use ordinal positions, which are offsets to identify where fields are within the record. There are no field delimiters. An end-of-record delimiter is required, even for the last record.
Data in a fixed-width text file is arranged in rows and columns, with one entry per row. Each column has a fixed width, specified in characters, which determines the maximum amount of data it can contain.
What you need is a lexer. Your record is too big to use a single Regex to parse, so you have to write one regex for each line, and a state machine to validate that the lines follows in the right order.
Or you can use a general purpose lexer/parser to generate the code for you. Wikipedia has long list. The Gold parser looks like a good candidate.
I would not try to do the lexing/parsing in PowerShell. I would rather write the code as C# or F# and use the assembly from PowerShell.
Edit: I've just looked at FileHelpers library. You could create a Multirecord Engine with a .NET Type that matches each line in you source record. All you have to do then is parse the result array for valid order and create objects.
I've done similar in powershell, and found that using a regex in a here-string is much easier to work with:
http://mjolinor.wordpress.com/2012/01/05/powershell-multiline-regex-matching/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With