edit2: Taking the hint that de Bruijn indices might be easier to work with, I've reformulated much of the internal representation of formulae to use a mixed de Bruijn representation ala Connor McBride's paper. This greatly simplified some algorithms regarding syntax that had to deal with α-equivalence, but made others more complicated. In any case, this means that free variables can be standardized apart and bound variables have canonical names represented by their binding distance. This isn't totally satisfactory, but it's at least a better start.
edit1: I realized the problem constraints aren't sufficient to guarantee standardization of variables. In particular, iteration of quantifiers means that variables in binders must be standardized apart first. So there is probably no escaping a solution that requires multiple passes over each abstract syntax tree. The suggestion to use de Bruijn indices is a fairly good solution generally, but it won't give standardization easily without breaking the notation and its utility.
original: Let V be an infinite set of variables indexed by the natural numbers, fv(φ) denote the free variables of φ, and bv(φ) denote the bound variables of φ, for any first-order formula φ. The problem I'm trying to solve is the following.
Let φ and ψ be first-order formulae. Find substitutions θ1 and θ2 such that: (a) fv(θ1(φ)), fv(θ2(φ)), bv(θ1(φ)), and bv(θ2(ψ)) are disjoint; (b) the union of fv(θ1(φ)), fv(θ2(φ)), bv(θ1(φ)), and bv(θ2(ψ)) is isomorphic to an initial segment of the natural numbers; and (c) all variables in bv(θ1(φ)) are less than all variables in bv(θ2(ψ)) are less than all variables in fv(θ1(ψ)) are less than all variables in fv(θ2(ψ)).
The difficulty is that the set of bound and free variables in a formula isn't necessarily disjoint, and quantifiers may be iterated, meaning that naive substitution produces accidental variable capture, and so forth.
An inefficient solution is the following. Given φ and ψ, first standardize apart the free variables of φ and ψ such that all free variables in the standardized terms are greater than the largest bound variable plus the number of binding operators in φ and ψ, giving φ' and ψ'. Then rename the bound variables of φ' starting from x0. Then the bound variables of ψ'. Then the free variables of φ'. Then the free variables of ψ'.
What I would like is a more efficient method of satisfying the problem constraints. That is, a solution that doesn't require the initial standardization that renames free variables. Standardizing variables apart efficiently isn't a problem. However, the additional constraint is giving me trouble.
Use De Bruijn idices for sure. Also, notice that they all positive. Then you can use the negative numbers during the unification process. If you want to use fresh indices, use a global one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With