Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Modelling an equivalent of database NULL in RDF

I would like to know if there is a standard or generally accepted way of representing an equivalent of NULL used in databases for RDF data.

More specifically, I'm interested in a way to distinguish the following cases for a value o of a property p (p is the predicate, o the object of an RDF triple):

  1. The value is not applicable, i.e. property p does not exist or does not make sense in the context.
  2. The value is unknown, i.e. it should be there but we don't know it.
  3. The value doesn't exist, i.e. the property doesn't have a value (e.g. year of death for a person alive).
  4. The value is witheld, e.g. when the data consumer is not allowed to access it.
like image 859
Mifeet Avatar asked Jun 01 '13 13:06

Mifeet


3 Answers

I don't know of a standard way of doing this, but one of the advantages of working in RDF is that you have a lot of flexibility in how you decide to do this. RDF, per se, cannot express negation (i.e., there is no incredibly convenient way to say that a triple s p o does not hold), but OWL can. As to the four cases you descibed, here are some approaches that you might make:

1. The value is not applicable, i.e. property p does not exist or does not make sense in the context.

If it does not make much sense for a property p to be have a value for a subject s, then it's probably acceptable to just not write any triples of the of the form s p o. Since RDF makes an open world assumption, it is often the case that, in data retrieval, one only queries for the data that one is interested in, and does not make too much of an effort to check where there are unexpected things. If you do want to do some sanity checking, then you can declare RDFS domains and ranges for properties. For instance, you might have:

hasBirthDate rdfs:domain AnimateObject .
hasConstructionDate rdfs:domain InanimateObject .

According to the semantics, if you then have

object82 hasBirthDate "2013-04-01" ;
         hasConstructionDate "2013-04-02" .

then you'll also infer that

object82 a AnimateObject, a InanimateObject .

and you might run a sanity check that looks for things that are both AnimateObjects and InanimateObjects. If anything is both, you probably have a problem that you should look into. If you use OWL, then you can actually declare that the AnimateObject and InanimateObject are disjoint and check for logical consistency. Alternatively, in OWL, you can add assertions such as

object82 hasConstructionDate max 0 

which says that object82 should have no values for the property hasConstructionDate.

In any case, add rdfs:comments to your properties explaining what the property should be used for and what it should not be used for. When appropriate, add rdfs:comments to individuals to explain why they should not have a value for a given property, if they should not have such a value.

2. The value is unknown, i.e., it should be there but we don't know it.

In this case, it is important to pin down what exactly “should” means. In OWL, for instance, you can say that

Person SubClassof (hasName min 1 String)

to assert that every person is related to at least one String by the property hasName; that is, every person has at least one name. That is one way of saying that there is some value, but we might not know what it is in a particular case. If you cannot work with OWL, but only with RDF, then you should probably add an rdfs:comment to the property hasName along the lines of “each NamedEntity should have at least one value for this property.”

3. The value doesn't exist, i.e., the property doesn't have a value (e.g. year of death for a person alive).

This is an interesting case, because RDF has no built in notion of time (in the sense that some triple holds until a given time, and after which time some other triple holds). If you are simply using an RDF graph as a database-like store that you can update (both by removing and inserting new triples), you could probably use some special reserved value for “I'm not dead yet!”. Having an open ended data model, as we do in RDF, makes it particularly easy to do something like this, because you really can just use some new value for it:

mp:JohnCleese hasDeathDate mp:notDeadYet .
mp:GrahamChapman hasDeathDate "1989-10-04" .

Of course, you can also be a bit more refined and use a boolean-valued property to indicate whether or not a value for the first property makes sense:

mp:JohnCleese isDeceased "false" .
mp:GrahamChapman isDeceased "true" ;
                 hasDeathDate "1989-10-04" .

4. The value is withheld, e.g., when the data consumer is not allowed to access it.

This, in my opinion, is the most interesting case, because it potentially involves the most interesting data transformation. If you have a nice dataset that people can query, and you want to indicate something about the results that they would obtain except for their lack of permission, you have lots of options in representing this. For instance, you could use something like HTTP status codes to replace nodes in the graph with blank nodes acting like redaction. For instance, you might have the data:

ex:JohnDoe hasSSN "000-00-0000" .
ex:JaneDoe hasSSN "000-00-0001" .

When someone asks for the data, you might respond (supposing that the first value is valid, and the second one invalid):

ex:JohnDoe hasSSN [ a ex:ValidSSN ] .
ex:JaneDoe hasSSN [ a ex:InvalidSSN ] .

In general, you could present a different view of the data to consumers than what you actually possess. I do not know of any standards for doing this sort of thing. You might be interested in the, somewhat related, recent W3C recommendation, PROV-O: The PROV Ontology, a vocabulary for describing the provenance of information (e.g., what it was generated from, to what is it attributed); it could be useful in describing the sorts of resources that might not, in their full form, be available to requesters.

like image 52
Joshua Taylor Avatar answered Nov 11 '22 12:11

Joshua Taylor


I do a bit of modelling in RDF. I know of no widely used vocabulary for representing the kind of information you are looking for. There is however a widely accepted pattern which is applicable.

In work I did about a year ago I had a similar requirement to represent properties with "nullable values". A property with a nullable value either had a value or a reason why the value wasn't present.

I represented this by introducing a b-node as the value of the property. That b-node would have either an rdf:value property linking to a value, or a reason property linking to a reason the value is not available, e.g.

:foo
   :aProp [a :nullableValue; rdf:value "value"] ;
   :bProp [a :nullableValue; :reason :notAvailable ]
.
like image 41
Brian Avatar answered Nov 11 '22 13:11

Brian


Like others on the w3 mailing list have pointed out: don't create triples with value 'NULL'. You should ignore this data when creating triples.

like image 4
dr0i Avatar answered Nov 11 '22 11:11

dr0i