Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing a very hectic space delimited file

I'm trying to help my dad out -- he gave me an export from a scheduling application at his work. We are trying to see if we can import it into a mysql database so he/co-workers can collaborate online with it.

I've tried a number of different methods but none seem to work right -- and this is not my area of specialties.

Export can be seen here: http://roikingon.com/export.txt

Any help / advice on how to go about parsing this would be greatly appreciated!

Thanks !!

like image 565
Roi Avatar asked Dec 29 '11 05:12

Roi


2 Answers

I've made an attempt to write a (somewhat dynamic) fixed-with-column parser. Take a look: http://codepad.org/oAiKD0e7 (it's too long for SO, but it's mostly just "data").

What I've noticed

  • Text-Data is left aligned with padding on the right like "hello___" (_ = space)
  • Numerical data is right aligned with padding on the left "___42"

If you want to use my code there's yet stuff to do:

  • The record types 12.x have variable column count (after some static columns), you'd have to implement another "handler" for it
  • Some of my width's are most probably wrong. I think there is a system (like numbers are 4 characters long and text 8 characters long, with some variations for special cases). Someone with domain knowledge and more than one sample file could figure out the columns.
  • Getting the raw-data out is only the first step, you have to map the raw-data to some useful model and write that model to the database.
like image 94
vstm Avatar answered Oct 13 '22 12:10

vstm


With that file structure you're basically in need of reverse engineering a proprietary format. Yes, it is space delimited but the format does not follow any kind of standard like CSV, YAML etc. It is completely proprietary with what seems to be a header and separate section with headers of their own.

I think your best bet is to try and see if there's some other type of export that can be done such as Excel or XML and working from there. If there isn't then see if there's an html output of some kind that can be screen scraped, and pasted into Excel and seeing what you get.

Due to everything I mentioned above it will be VERY difficult to massage the file in its current form into something that can be sensibly imported into a database. (Note that from the file structure a number of tables would be needed.)

like image 28
Paul Sasik Avatar answered Oct 13 '22 10:10

Paul Sasik