Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse this OFX file?

this is an original ofx file as it comes from m bank (no worries, theres nothing sensitive, i cut out the middle part with all the transactions)

Open Financial Exchange (OFX) is a data-stream format for exchanging financial information that evolved from Microsoft's Open Financial Connectivity (OFC) and Intuit's Open Exchange file formats.

now i need to parse this. i already saw that question, but this is not a dup because i am interested in how to do this.

i am sure i could figure out some clever regexps that would do the job, but that is ugly and error vulnerable (if the format is changed, some fields may be missing, the formatting/white spaces are different etc etc...)

OFXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE
<OFX>
    <SIGNONMSGSRSV1>
        <SONRS>
            <STATUS>
                <CODE>0
                <SEVERITY>INFO
            </STATUS>
            <DTSERVER>20110420000000[+1:CET]
            <LANGUAGE>ENG
        </SONRS>
    </SIGNONMSGSRSV1>
    <BANKMSGSRSV1>
        <STMTTRNRS>
            <TRNUID>1
            <STATUS>
                <CODE>0
                <SEVERITY>INFO
            </STATUS>
            <STMTRS>
                <CURDEF>EUR
                <BANKACCTFROM>
                    <BANKID>20404
                    <ACCTID>02608983629
                    <ACCTTYPE>CHECKING
                </BANKACCTFROM>
                    <BANKTRANLIST>
                    <DTSTART>20110207
                    <DTEND>20110419
                    <STMTTRN>
                        <TRNTYPE>XFER
                        <DTPOSTED>20110205000000[+1:CET]
                        <TRNAMT>-6.12
                        <FITID>C74BD430D5FF2521
                        <NAME>unbekannt
                        <MEMO>BILLA DANKT  1265P K2 05.02.UM 17.49 
                    </STMTTRN>
                    <STMTTRN>
                        <TRNTYPE>XFER
                        <DTPOSTED>20110207000000[+1:CET]
                        <TRNAMT>-10.00
                        <FITID>C74BE0F90A657901
                        <NAME>unbekannt
                        <MEMO>AUTOMAT  13177 KARTE2 07.02.UM 10:22 
                    </STMTTRN>
............................. goes on like this ........................
                    <STMTTRN>
                        <TRNTYPE>XFER
                        <DTPOSTED>20110418000000[+1:CET]
                        <TRNAMT>-9.45
                        <FITID>C7A5071492D14D29
                        <NAME>unbekannt
                        <MEMO>HOFER DANKT  0408P K2 18.04.UM 18.47 
                    </STMTTRN>
                </BANKTRANLIST>
                <LEDGERBAL>
                    <BALAMT>1992.29
                    <DTASOF>20110420000000[+1:CET]
                </LEDGERBAL>
            </STMTRS>
        </STMTTRNRS>
    </BANKMSGSRSV1>
</OFX>

i currently use this code which gives me the desired result:

<?

$files = array();
$files[] = '***_2011001.ofx';
$files[] = '***_2011002.ofx';
$files[] = '***_2011003.ofx';

system('touch file.csv && chmod 777 file.csv');
$fp = fopen('file.csv', 'w');

foreach($files as $file) {
    echo $file."...\n";
    $content = file_get_contents($file);

    $content = str_replace("\n","",$content);
    $content = str_replace(" ","",$content);

    $regex = '|<STMTTRN><TRNTYPE>(.+?)<DTPOSTED>(.+?)<TRNAMT>(.+?)<FITID>(.+?)<NAME>(.+?)<MEMO>(.+?)</STMTTRN>|';


    echo preg_match_all($regex,$content,$matches,PREG_SET_ORDER)." matches... \n";


    foreach($matches as $match) {
        echo ".";
        array_shift($match);
        fputcsv($fp, $match);
    }
    echo "\n";
}
echo "done.\n";
fclose($fp);

this is really ugly and if this was a valid xml file i would personally kill myself for that, but how to do it better?

like image 342
The Surrican Avatar asked Apr 20 '11 13:04

The Surrican


People also ask

What is an OFX file?

Open Financial Exchange (OFX) is a data-stream format for exchanging financial information that evolved from Microsoft's Open Financial Connectivity (OFC) and Intuit's Open Exchange file formats. Open Financial Exchange. Filename extension.


1 Answers

Your code seems fine, considering that the file isn't XML or even SGML. The only thing you could do is try to make a more generic SAX-like parser. That is, you simply go through the input stream one block at a time (where block can be anything, e.g. a line or simply a set amount of characters). Then, call a callback function every time you encounter an <ELEMENT>. You can even go as fanciful as building a parser class where you can register callback functions that listen to specific elements.

It will be more generic and less "ugly" (for some definition of "ugly") but it will be more code to maintain. Nice to do and nice to have if you need to parse this file format a lot (or in a lot of different variations). If your posted code is the only place you do this then just KISS.

like image 79
Sander Marechal Avatar answered Oct 09 '22 16:10

Sander Marechal