Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath query filtering by date

I have some sample XML where I am querying for nodes based on a date.

Sample XML document:

<?xml version="1.0" encoding="UTF-16" standalone="yes"?>
<NewDataSet>
    <Table>
        <EmployeeBankGUID>dc396ebe-c8a4-4a7f-85b5-b43c1890d6bc</EmployeeBankGUID>
        <ValidFromDate>2012-02-01T00:00:00-05:00</ValidFromDate>
    </Table>
    <Table>
        <EmployeeBankGUID>2406a5aa-0246-4cd7-bba5-bb17a993042b</EmployeeBankGUID>
        <ValidFromDate>2013-02-01T00:00:00-05:00</ValidFromDate>
    </Table>
    <Table>
        <EmployeeBankGUID>2af49699-579e-4beb-9ab0-a58b4bee3158</EmployeeBankGUID>
        <ValidFromDate>2014-02-01T00:00:00-05:00</ValidFromDate>
    </Table>
</NewDataSet>

So there are basically three dates:

  • 2/1/2012
  • 2/1/2013
  • 2/1/2014

Using MSXML I can query and filter by these dates using an XPath query:

/NewDataSet/Table[ValidFromDate>"2013-02-12"]

And this works, and returns an IXMLDOMNodeList containing one item:

<Table>
    <EmployeeBankGUID>2af49699-579e-4beb-9ab0-a58b4bee3158</EmployeeBankGUID>
    <ValidFromDate>2014-02-01T00:00:00-05:00</ValidFromDate>
</Table>

Except it doesn't work anymore

That XPath query using using MSXML; the variant of xml that Microsoft created in the late 1990's, before the W3C standardized on a completely different form of XPath.

DOMDocument doc = new DOMDocument();
//...load the xml...
IXMLDOMNodeList nodes = doc.selectNodes('/NewDataSet/Table[ValidFromDate>"2013-02-12"]');

But that version of MSXML is not "standards compliant" (since it was created before there were standards). Since 2005 the recommended one, the one that follows the standards, the only one that has features I require is MSXML 6.

It's a simple change, just instantiate a DOMDocument60 class rather than a DOMDocument class:

DOMDocument doc = new DOMDocument60();
//...load the xml...
IXMLDOMNodeList nodes = doc.selectNodes('/NewDataSet/Table[ValidFromDate>"2013-02-12"]');

Except the same XPath query returns nothing.

What is the "standards compliant" way to filtering a value by date?

Pretend it's a string, you say

You might be thinking that I might be thinking that XML is treating the 2013-02-01T00:00:00-05:00 as some sort of special date, when in reality it's a string. So maybe I should just think of it like string comparisons.

Which would work, except that it doesn't work. No string comparison works:

  • /NewDataSet/Table[ValidFromDate<"a"] returns no nodes
  • /NewDataSet/Table[ValidFromDate>"a"] returns no nodes
  • /NewDataSet/Table[ValidFromDate!="a"] returns all nodes
  • /NewDataSet/Table[ValidFromDate>"2014-02-12T00:00:00-05:00"] returns no nodes
  • /NewDataSet/Table[ValidFromDate<"2014-02-12T00:00:00-05:00"] returns no nodes
  • /NewDataSet/Table[ValidFromDate!="2014-02-12T00:00:00-05:00"] returns no nodes

So, there we have it

What is the "standards compliant" way to achieve what used to work?

What is the "correct" way to XPath query for date strings?

Or, better yet, why are my XPath queries not working?

Or, better better yet, why does the query that used to work no longer work? What was the decision that was made that decided the syntax was bad. What were edge cases they were solving by "breaking" the query syntax?

MSXML6 compatible version

Here's the final functional code, nearly in the language I use:

DOMDocument60 GetXml(String url)
{
   XmlHttpRequest xml = CoServerXMLHTTP60.Create();
   xml.Open('GET', url, False, '', '');
   xml.Send(EmptyParam);

   DOMDocument60 doc = xml.responseXML AS DOMDocument60;

   //MSXML6 removed all kinds of features originally present (thanks W3C)
   //Need to use Microsoft's proprietary extensions to get some of it back (thanks W3C)
   doc.setProperty('SelectionNamespaces', 'xmlns:ms="urn:schemas-microsoft-com:xslt"');

   return doc;
}


DOMDocument doc = GetXml('http://example.com/GetBanks.ashx?employeeID=12345');

//Finds future banks. 

//Only works in MSXML3; intentionally broken in MSXML6 (thanks W3C):
//String qry = '/NewDataSet/Table[ValidFromDate > "2014-02-12"]';

//MSXML6 compatible version of doing the above (send complaints to W3C);
String qry = '/NewDataSet/Table[ms:string-compare(ValidFromDate, "2014-02-12") >= 0]';

IXMLDOMNodeList nodes = doc.selectNodes(qry);
like image 932
Ian Boyd Avatar asked Oct 01 '22 15:10

Ian Boyd


1 Answers

XPath is not date-aware

What is the "correct" way to XPath query for date strings?

In XPath 1.0, there is no way to handle date strings, just think of time zone support. At least there is no correct way to handle them. Comparing strings will fail if timezones are different.

Comparing strings

Or, better yet, why are my XPath queries not working?

XPath 1.0 only defines equality operators on strings, for greater/less than the values have to be converted to numbers.

Use ms:string-compare which was introduced in MSXML 4.0.

/NewDataSet/Table[
  ms:string-compare(ValidFromDate, "2014-02-12T00:00:00-05:00") > 0
]

For the rest of the (XML) world

What is the "standards compliant" way to achieve what used to work?

An alternative that also works in other XPath implementations (I tested it using xmllint, which uses libxml) might be to translate away all non-string characters, so the string will be parseable as a number:

/NewDataSet/Table[
  translate(ValidFromDate, "-:T", "") < translate("2014-02-12T00:00:00-05:00", "-:T", "")
]
like image 153
Jens Erat Avatar answered Oct 21 '22 20:10

Jens Erat