I have been trying to implement an algorithm for the semi naive evaluation of a datalog program but couldn't get a straightforward answer anywhere that explains the difference in simple words.
According to my understanding naive is a bottom up evaluation technique so is semi-naive.
In the first iteration both evaluation techniques start with an empty set.
As the iterations proceed further both end up having iterations and producing tuples until a new tuple is reached.
So the semi-naive starts from head or body of the rule?
path (X,Y):- edge(X,Y).
path (X,Y):- edge(X,Z),path (Z,Y).
Can someone please explain how the EDB and IDB gets updated at the end of each iteration for the above program. Is the tuples stored under each predicate. Like a separate column for edge and a separate column for path or they get stored as a collection.
Also what is the difference between global and local unification?
The difference between the naîve and semi-naîve evaluation in Datalog is that when you're evaluating using the naïve implementation you take all the initial dataset (existing EDBs) plus the news ones (inferred EDBs) for each iteration. For example, if you have the IDBs like this:
reachable(X,Y) :- link(X,Y).
reachable(X,Y) :- link(X,Z), reachable(Z,Y).
And a set of EDBs like this: link = {(a,b), (b,c), (c,c), (c,d)}
The procedure to execute the evaluation is:
When you're using a naîve approach in every step you'll have the follow data as input and output:
| Iteration |Input for the current iteration I_{i} | New facts inferred |
|-----------|-------------------------------------------------|------------------------------|
| 1 | {} | {(a,b), (b,c), (c,c), (c,d)} |
| 2 | {(a,b), (b,c), (c,c), (c,d)} | {(a,c),(b,c),(b,d),(c,d)} |
| 3 | {(a,b), (b,c), (c,c), (c,d), (a,c), (b,d)} | {(a,d)} |
| 4 | {(a,b), (b,c), (c,c), (c,d), (a,c), (b,d),(a,d)}| {} |
At the 4th iteration, you'll stop because the fixpoint is reached, and no new facts could be inferred. However, in the semi-naïve approach, you apply an optimization, instead of using all derived facts as input to rules at each iteration, it's possible to send to each iteration only the tuples already learned in prior iterations, for avoid duplicates tuples.
| Iteration |Input for the current iteration I_{i} | New facts inferred |
|-----------|---------------------------------------|------------------------------|
| 1 | {} | {(a,b), (b,c), (c,c), (c,d)} |
| 2 | {(a,b), (b,c), (c,c), (c,d)} | {(a,c),(b,c),(b,d),(c,d)} |
| 3 | {(a,c), (b,d)} | {(a,d)} |
| 4 | {(a,d)} | {} |
Source: Datalog and Recursive Query Processing
I am going to tell you about Prolog. I don't know if it applies equally to Datalog or not, so if I'm wrong, someone will just have to correct me.
So the semi-naive starts from head or body of the rule?
According to this handout, you can proceed with a variable-based or a tuple-based algorithm, but in both cases you start with the body and only after it succeeds do you add a tuple representing the head:
Variable-based: Consider all possible assignments to the variable of the body. If the assignment makes the body true, add the tuple for the head to the result.
Tuple-based: Consider all assignments of tuples from the nonnegated relational subgoals. If the assignment makes the body true, add the tuple for the head to the result.
This jibes well with what I know about Prolog and backward-chaining: you want to conclude the head, so you must first prove the body.
You seem also to be asking if semi-naive has something to say about whether you start from the head or the body. Based on what I have reviewed today, it looks to me like semi-naive is a twist on the naive algorithm rather than a completely new thing. It's like a tabled version of the naive approach; if there are multiple naive approaches, you'll have just as many semi-naive approaches. If there's just one naive, there will be only one semi-naive.
Can someone please explain how the EDB and IDB gets updated at the end of each iteration for the above program.
That's easy: they do not. The EDB is the set of facts in the database. The IDB is the set of rules in the database. A query is just a query, it doesn't modify the database. The tuples that the query returns are another matter.
Are the tuples stored under each predicate?
The tuples that represent facts in the EDB are already stored in the EDB as facts. The tuples derived from the rules in the IDB are computed and become part of the result set and are not stored. In neither case is the store updated as a result of performing a query.
If we were talking about Prolog here, there would be this stack of recursive invocations going on. The outermost call might say path(a, z)
; inside that call might be something like edge(a, b), path(b, z)
, which will beget a call edge(b, c), path(c, z)
. In Prolog terms, each time you enter another invocation you wind up with a fresh set of variables, some bound, some yet-to-be-bound. In your Datalog world, it seems to me that edge(a,b)
and edge(b,c)
already exist as tuples in your EDB. During the query they will be part of the tuples in the stack of tuples that represents your result. The IDB contained a rule called path/2
, and once you satisfy the recursive call, you will wind up with some new tuples like path(a,z)
in your result. But that result tuple is not a fact stored in your EDB (which only contains facts like edge/2
) nor is it going to replace the rule path/2
. It's just part of the result of your query.
Also what is the difference between global and local unification?
I wasn't able to find anything using those terms and I can't imagine what they would mean.
So, there's my guess. Let's see how wide of the mark I am.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With