I'm looking through some of the triples contained within the Freebase data dump, and some of the date times look like this:
"T12:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>
Which is ingestible by some triplestores, but not by others.
So, is this a valid dateTime? and if so, why is it valid?
It's not a valid xsd:dateTime, but it is a syntactically valid RDF literal term, but one that is semantically inconsistent.
First, let's see why T12:00
isn't in the lexical space of xsd:dateTime. The standard xsd:dateTime says:
The lexical space of dateTime consists of finite-length sequences of characters of the form:
'-'? yyyy '-' mm '-' dd 'T' hh ':' mm ':' ss ('.' s+)? (zzzzzz)?
T12:00
matches part of that, but it's lacking the year, month, and day, and second parts.
However, as RobV pointed out an RDF literal term is still syntactically valid, even if the lexical form isn't in the lexical space of the datatype. In RDF 1.1 Concepts and Abstract Syntax, we have this (note 2.b):
3.3 Literals
A literal in an RDF graph consists of two or three elements:
- a lexical form, being a Unicode string, which SHOULD be in Normal Form C,
- a datatype IRI, being an IRI identifying a datatype that determines how the lexical form maps to a literal value, and
- if and only if the datatype IRI is
http://www.w3.org/1999/02/22-rdf-syntax-ns#langString
, a non-empty language tag as defined by [BCP47]. The language tag MUST be well-formed according to section 2.2.9 of [BCP47].… The literal value associated with a literal is:
- If the literal is a language-tagged string, then the literal value is a pair consisting of its lexical form and its language tag, in that order.
- If the literal's datatype IRI is in the set of recognized datatype IRIs, let d be the referent of the datatype IRI.
- a. If the literal's lexical form is in the lexical space of d, then the literal value is the result of applying the lexical-to-value mapping of d to the lexical form.
- b. Otherwise, the literal is ill-typed and no literal value can be associated with the literal. Such a case produces a semantic inconsistency but is not syntactically ill-formed. Implementations MUST accept ill-typed literals and produce RDF graphs from them. Implementations MAY produce warnings when encountering ill-typed literals.
- If the literal's datatype IRI is not in the set of recognized datatype IRIs, then the literal value is not defined by this specification.
Thus, "T12:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>
is an RDF literal term, but it's a semantically inconsistent one. This alone doesn't make the Freebase dump invalid RDF. An implementation must process it and create an RDF graph from it, but can warn about it. That means that an RDF parser has to be able to process it. I'm not sure whether a triple store counts as "an implementation" or not. If it does, then it should store the resulting value. If it's not, then I guess it's OK for it to only store RDF graphs that have only semantically consistent literals.
As Joshua says it is not a valid xsd:dateTime
however it is still a valid RDF literal
A RDF literal consists of a lexical value - the T12:00
- and an optional data type/language specifier. In your case it has the optional data type of xsd:dateTime
So the difference in behaviour you see between stores is down to whether stores enforce data type restrictions on the lexical form of the literal or not i.e. do they require that the lexical values for xsd:
datatypes match the rules laid out in XML Schema Part 2: Datatypes
Stores which enforce this will only allow valid values while those that do not allow mixtures or valid and invalid values. Some of the strict stores may have options to allow the invalid values in which case check with your vendor/community as to whether this is the case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With