Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should I model data accuracy/confidence in a database?

Say I have a database holding timestamps. For every timestamp attribute I might add an accuracy attribute, stating the confidence interval, so the information being stored might be, for example, "1st July 2012 12:13, +/- 3 months".

But in general, recording accuracy/confidence is not so simple. A genealogical database might need to record the fact that a person might be the father of another person.

So are there any general principles or best practices on storing information with varying levels of accuracy/confidence?

like image 827
jl6 Avatar asked Jul 01 '12 11:07

jl6


1 Answers

With your father example it's easy; it's impossible to be more than 100% confident that someone is the father of someone else; generally it's impossible to be more than 100% confident of anything! This in turn implies that for everything you can simply store the percentage confidence level of any data-attribute.

However, you might not want to store the confidence level as a percentage; it depends on the data-attribute itself, and the meaning of the data.

For instance if you want to store how "accurate" a particular string is when compared to another you might want to store the Levenshtein distance instead. In your timestamp example I, personally would store the minimum and maximum values, though you could also store the number of months that you would add or subtract; either would make it quick to calculate on selection from the database.

What I'm, possibly unclearly, trying to write is that the answer to your question doesn't depend on the database but on the data therein and the needs of your users, business etc. As it depends on the data each individual attribute or column needs an individual solution; there cannot be a "generic" solution.

like image 91
Ben Avatar answered Sep 28 '22 00:09

Ben