Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse a OFX (Version 1.0.2) file in PHP?

I have a OFX file downloaded from Citibank, this file has a DTD defined at http://www.ofx.net/DownloadPage/Files/ofx102spec.zip (file OFXBANK.DTD), the OFX file appear to be SGML valid. I'm trying with DomDocument of PHP 5.4.13, but I get several warning and file is not parsed. My Code is:

$file = "source/ACCT_013.OFX";
$dtd = "source/ofx102spec/OFXBANK.DTD";
$doc = new DomDocument();
$doc->loadHTMLFile($file);
$doc->schemaValidate($dtd);
$dom->validateOnParse = true;

The OFX file start as:

OFXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE

<OFX>
<SIGNONMSGSRSV1>
<SONRS>
<STATUS>
<CODE>0
<SEVERITY>INFO
</STATUS>
<DTSERVER>20130331073401
<LANGUAGE>SPA
</SONRS>
</SIGNONMSGSRSV1>
<BANKMSGSRSV1>
<STMTTRNRS>
<TRNUID>0
<STATUS>
<CODE>0
<SEVERITY>INFO
</STATUS>
<STMTRS>
<CURDEF>COP
<BANKACCTFROM> ...

I'm open to install and use any program in Server (Centos) for call from PHP.

PD: This class http://www.phpclasses.org/package/5778-PHP-Parse-and-extract-financial-records-from-OFX-files.html don't work for me.

like image 885
Jose Nobile Avatar asked Mar 31 '13 22:03

Jose Nobile


People also ask

How do I view a OFX file?

For Mac users, OFX files can be opened by using GnuCash, Intuit Quicken, Reilly Technologies Moneydance and Apple Numbers. For Microsoft Windows users, they can be opened using GnuCash, Sage Accpac, Microsoft Money, Intuit Quicken and Reilly Technologies Moneydance.

What is download OFX file?

An OFX file is a financial data file created in the Open Financial Exchange (OFX) format, an open format for transferring data between vendors, consumers, and financial systems. It contains transactions, statements, and other financial information.

What is a Microsoft OFX file?

Open Financial Exchange (OFX) is a data-stream format for exchanging financial information that evolved from Microsoft's Open Financial Connectivity (OFC) and Intuit's Open Exchange file formats. Open Financial Exchange. Filename extension. .ofx.


1 Answers

Well first of all even XML is a subset of SGML a valid SGML file must not be a well-formed XML file. XML is more strict and does not use all features that SGML offers.

As DOMDocument is XML (and not SGML) based, this is not really compatible.

Next to that problem, please see 2.2 Open Financial Exchange Headers in Ofexfin1.doc it explains you that

The contents of an Open Financial Exchange file consist of a simple set of headers followed by contents defined by that header

and further on:

A blank line follows the last header. Then (for type OFXSGML), the SGML-readable data begins with the <OFX> tag.

So locate the first blank line and strip everyhing until there. Then load the SGML part into DOMDocument by converting the SGML into XML first:

$source = fopen('file.ofx', 'r');
if (!$source) {
    throw new Exception('Unable to open OFX file.');
}

// skip headers of OFX file
$headers = array();
$charsets = array(
    1252 => 'WINDOWS-1251',
);
while(!feof($source)) {
    $line = trim(fgets($source));
    if ($line === '') {
        break;
    }
    list($header, $value) = explode(':', $line, 2);
    $headers[$header] = $value;
}

$buffer = '';

// dead-cheap SGML to XML conversion
// see as well http://www.hanselman.com/blog/PostprocessingAutoClosedSGMLTagsWithTheSGMLReader.aspx
while(!feof($source)) {

    $line = trim(fgets($source));
    if ($line === '') continue;

    $line = iconv($charsets[$headers['CHARSET']], 'UTF-8', $line);
    if (substr($line, -1, 1) !== '>') {
        list($tag) = explode('>', $line, 2);
        $line .= '</' . substr($tag, 1) . '>';
    }
    $buffer .= $line ."\n";
}

// use DOMDocument with non-standard recover mode
$doc = new DOMDocument();
$doc->recover = true;
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
$save = libxml_use_internal_errors(true);
$doc->loadXML($buffer);
libxml_use_internal_errors($save);

echo $doc->saveXML();

This code-example then outputs the following (re-formatted) XML which also shows that DOMDocument loaded the data properly:

<?xml version="1.0"?>
<OFX>
  <SIGNONMSGSRSV1>
    <SONRS>
      <STATUS>
        <CODE>0</CODE>
        <SEVERITY>INFO</SEVERITY>
      </STATUS>
      <DTSERVER>20130331073401</DTSERVER>
      <LANGUAGE>SPA</LANGUAGE>
    </SONRS>
  </SIGNONMSGSRSV1>
  <BANKMSGSRSV1>
    <STMTTRNRS>
      <TRNUID>0</TRNUID>
      <STATUS>
        <CODE>0</CODE>
        <SEVERITY>INFO</SEVERITY>
      </STATUS>
      <STMTRS><CURDEF>COP</CURDEF><BANKACCTFROM> ...</BANKACCTFROM>
</STMTRS>
    </STMTTRNRS>
  </BANKMSGSRSV1>
</OFX>

I do not know whether or not this can be validated against the DTD then. Maybe this works. Additionally if the SGML is not written with the values that are of a tag on the same line (and only a single element on each line is required), then this fragile conversion will break.

like image 103
hakre Avatar answered Oct 03 '22 16:10

hakre