Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Proving Floor_Log2 in Spark

New to Spark, and new to Ada, so this question may be overly broad. However, it's asked in good faith, as part of an attempt to understand Spark. Besides direct answers to the questions below, I welcome critique of style, workflow, etc.

As my first foray into Spark, I chose to try to implement (easy) and prove correctness (unsuccessful so far) the function \left \lfloor{log_2(x)}\right \rfloor.

Question: What is the proper way of implementing and proving the correctness of this function?

I started with the following util.ads:

package Util is
   function Floor_Log2(X : Positive) return Natural with
     Post => 2**Floor_Log2'Result <= X and then X < 2**(Floor_Log2'Result + 1);    
end Util;

I have no pre-condition because the ranges of the input fully expresses the only interesting pre-condition. The post-condition I wrote based on the mathematical definition; however, I have an immediate concern here. If X is Positive'Last, then 2**(Floor_Log2'Result + 1) exceeds Positive'Last and Natural'Last. Already I'm up against my limited knowledge of Ada here, so: Sub-question 1: What is the type of the sub-expression in the post condition, and is this overflow a problem? Is there a general way to resolve it? To avoid the issue in this particular case, I revised the specification to the less-intuitive but equivalent:

package Util is
   function Floor_Log2(X : Positive) return Natural with
     Post => 2**Floor_Log2'Result <= X and then X/2 < 2**Floor_Log2'Result;
end Util;

There are many ways to implement this function, and I'm not particularly concerned about performance at this point, so I'd be happy with any of them. I'd consider the "natural" implementation (given my particular C background) to be something like the following util.adb:

package body Util is
   function Floor_Log2 (X : Positive) return Natural is
      I : Natural := 0;
      Remaining : Positive := X;
   begin
      while Remaining > 1 loop
         I := I + 1;
         Remaining := Remaining / 2;
      end loop;
      return I;
   end Floor_Log2;
end Util;

Attempting to prove this with no loop invariants fails, as expected. Results (this and all results are GNATprove level 4, invoked from GPS as gnatprove -P%PP -j0 %X --ide-progress-bar -u %fp --level=4 --report=all):

util.adb:6:13: info: initialization of "Remaining" proved[#2]
util.adb:7:15: info: initialization of "I" proved[#0]
util.adb:7:17: medium: overflow check might fail[#5]
util.adb:8:23: info: initialization of "Remaining" proved[#1]
util.adb:8:33: info: range check proved[#4]
util.adb:8:33: info: division check proved[#8]
util.adb:10:14: info: initialization of "I" proved[#3]
util.ads:3:14: medium: postcondition might fail, cannot prove 2**Floor_Log2'Result <= X[#7]
util.ads:3:15: medium: overflow check might fail[#9]
util.ads:3:50: info: division check proved[#6]
util.ads:3:56: info: overflow check proved[#10]

Most of the errors here make basic sense to me. Starting with the first overflow check, GNATprove cannot prove that the loop terminates in less than Natural'Last iterations (or at all?), so it cannot prove that I := I + 1 doesn't overflow. We know that this isn't the case, because Remaining is decreasing. I tried to express this adding the loop variant pragma Loop_Variant (Decreases => Remaining), and GNATprove was able to prove that loop variant, but the potential overflow of I := I + 1 is unchanged, presumedly because proving the loop terminates at all is not equivalent to proving that it terminates in less than Positive'Last iterations. A tighter constraint would show that the loop terminates in at most Positive'Size iterations, but I'm not sure how to prove that. Instead, I "forced it" by adding a pragma Assume (I <= Remaining'Size); I know this is bad practice, the intent here was purely to let me see how far I could get with this first issue "swept under the covers." As expected, this assumption lets the prover prove all range checks in the implementation file. Sub-question 2: What is the correct way to prove that I does not overflow in this loop?

However, we've still made no progress on proving the postcondition. A loop invariant is clearly needed. One loop invariant that holds at the top of the loop is that pragma Loop_Invariant (Remaining * 2**I <= X and then X/2 < Remaining * 2**I); besides being true, this invariant has the nice property that it is clearly equivalent to the post-condition when the loop termination condition is true. However, as expected GNATprove is unable to prove this invariant: medium: loop invariant might fail after first iteration, cannot prove Remaining * 2**I <= X[#20]. This makes sense, because the inductive step here is non-obvious. With division on the real numbers one could imagine a straightforward lemma stating that for all I, X * 2**I = (X/2) * 2**(I+1), but (a) I don't expect GNATprove to know that without a lemma being provided, and (b) it's messier with integer division. So, Sub-Question 3a: Is this the appropriate loop invariant to try to use to prove this implementation? Sub-Question 3b: If so, what's the right way to prove it? Externally prove a lemma and use that? If so, what exactly does that mean?

At this point, I thought I'd explore a completely different implementation, just to see if it led anywhere different:

package body Util is
   function Floor_Log2 (X : Positive) return Natural is
   begin
      for I in 1 .. X'Size - 1 loop
         if 2**I > X then
            return I - 1;
         end if;
      end loop;
      return X'Size - 1;
   end Floor_Log2;
end Util;

This is a less intuitive implementation to me. I didn't explore this second implementation as much, but I leave it here to show what I tried; to give a potential avenue for other solutions to the main question; and to raise additional sub-questions.

The idea here was to bypass some of the proof around overflow of I and termination conditions by making termination and ranges explicit. Somewhat to my surprise, the prover first choked on overflow checking the expression 2**I. I had expected 2**(X'Size - 1) to be provably within the bounds of X -- but again, I'm up against the limits of my Ada knowledge. Sub-Question 4: Is this expression actually overflow-free in Ada, and how can that be proven?

This has turned out to be a long question... but I think the questions I'm raising, in the context of a nearly-trivial example, are relatively general and likely to be useful to others who, like me, are trying to understand if and how Spark is relevant to them.

like image 637
addaon Avatar asked Dec 12 '18 23:12

addaon


2 Answers

Preventing overflow in the post condition

Given the original function signature

function Floor_Log2 (X : Positive) return Natural with
   Post => 2**Floor_Log2'Result <= X and then X < 2**(Floor_Log2'Result + 1);  

I observe that I need to limit the domain of X in order to prevent overflow in the second term of the post condition. Given the definitions in Standard.ads, i.e.

type Integer is range -(2**31) .. +(2**31 - 1);
for Integer'Size use 32;

subtype Natural  is Integer range 0 .. Integer'Last;
subtype Positive is Integer range 1 .. Integer'Last;

I conclude that, in order to prevent overflow,

X < 2**(Floor_Log2'Result + 1) <= 2**31 - 1

and therefore X <= 2**30 - 1. Hence, I changed the function signature to:

subtype Pos is Positive range 1 .. 2**30 - 1
      
function Floor_Log2 (X : Pos) return Natural with
  Post => 2**Floor_Log2'Result <= X and then X < 2**(Floor_Log2'Result + 1);

First Approach

In principle, I could now proof the post condition as follows in GNAT CE 2019 (note that I use a different algorithm compared to the one stated in the question):

util.ads

package Util with SPARK_Mode is
   
   subtype Pos is Positive range 1 .. 2**30 - 1
      
   function Floor_Log2 (X : Pos) return Natural with
     Post => 2**Floor_Log2'Result <= X and then X < 2**(Floor_Log2'Result + 1);
   
end Util;

util.adb

package body Util with SPARK_Mode is  
  
   ----------------
   -- Floor_Log2 --
   ----------------
   
   function Floor_Log2 (X : Pos) return Natural is      
      L : Positive := 1;
      H : Positive := L * 2;
      I : Natural  := 0;
   begin
            
      while not (L <= X and then X < H) loop
         
         pragma Loop_Invariant
           (L = 2 ** I and H = 2 ** (I+1));

         pragma Loop_Invariant
           (for all J in 0 .. I =>
               not (2 ** J <= X and then X < 2 ** (J+1)));
         
         L := H;         
         H := H * 2;    
         I := I + 1;
         
      end loop;
      
      return I;
      
   end Floor_Log2;  

end Util;

Unfortunately, however, the provers have difficulties with the non-linear arithmetic (i.e. exponentiation) and all proof sessions (on my computer) end with a timeout. In fact, if I run gnatprove with effort level 0, then I can only proof the post condition when I limit the upper bound of Pos to 2**7 - 1, i.e.

subtype Pos is Positive range 1 .. 2**7 - 1;

Increasing the effort level (or timeout) allows me to proof the post condition for larger values of Pos'Last.

Second Approach

In order to work around the limitation of the provers, I applied a little trick by redefining the exponentiation function. I could then use the following code to prove the post condition for the full range of Pos when I run gnatprove with effort level 1:

spark_exp.ads

generic
   type Int is range <>;   
   Base  : Int;
   N_Max : Natural;
package SPARK_Exp with SPARK_Mode is
   
   subtype Exp_T is Natural range 0 .. N_Max;
   
   function Exp (N : Exp_T) return Int with Ghost;

private
   
   type Seq_T is array (Exp_T range <>) of Int;
   
   function Exp_Seq return Seq_T with
     Ghost,
     Post =>  (Exp_Seq'Result'First = 0)
     and then (Exp_Seq'Result'Last  = N_Max)
     and then (Exp_Seq'Result (0) = 1)
     and then (for all I in 1 .. N_Max =>
                 Exp_Seq'Result (I) = Base * Exp_Seq'Result (I - 1) and
                   Int'First < Exp_Seq'Result (I) and Exp_Seq'Result (I) < Int'Last);
      
   function Exp (N : Exp_T) return Int is (Exp_Seq (N));   
   
end SPARK_Exp;

spark_exp.adb

package body SPARK_Exp with SPARK_Mode is

   -------------
   -- Exp_Seq --
   -------------

   function Exp_Seq return Seq_T is
      S : Seq_T (Exp_T'Range) := (others => 1);
   begin

      for I in 1 .. N_Max loop

         pragma Loop_Invariant
           (for all J in 1 .. I - 1 =>
              S (J) = Base * S (J - 1) and
                (Int'First / Base) < S (J) and S (J) < (Int'Last / Base));

         S (I) := Base * S (I - 1);

      end loop;

      return S;

   end Exp_Seq;

end SPARK_Exp;

util.ads

with SPARK_Exp;

package Util with SPARK_Mode is
   
   subtype Pos is Positive range 1 .. 2**30 - 1;
   
   
   package SPARK_Exp_2 is
     new SPARK_Exp (Positive, 2, 30);
   
   function Exp2 (N : SPARK_Exp_2.Exp_T) return Positive
     renames SPARK_Exp_2.Exp;   
   
   
   function Floor_Log2 (X : Pos) return Natural with
     Post => (Exp2 (Floor_Log2'Result) <= X) and then
                (X < Exp2 (Floor_Log2'Result + 1));

end Util;

util.adb

package body Util with SPARK_Mode is
   
   ----------------
   -- Floor_Log2 --
   ----------------
   
   function Floor_Log2 (X : Pos) return Natural is      
      L : Positive := 1;
      H : Positive := L * 2;
      I : Natural  := 0;
   begin
            
      while not (L <= X and then X < H) loop
         
         pragma Loop_Invariant
           (L = Exp2 (I) and H = Exp2 (I + 1));

         pragma Loop_Invariant
           (for all J in 0 .. I =>
               not (Exp2 (J) <= X and then X < Exp2 (J + 1)));
         
         L := H;         
         H := H * 2;    
         I := I + 1;
         
      end loop;
      
      return I;
      
   end Floor_Log2;

end Util;
like image 117
DeeDee Avatar answered Nov 14 '22 00:11

DeeDee


I can't help with your SPARK questions, but I can answer some of your Sub-Questions.

Sub-Question 1: Since you're using "<" for Integer, the sub-expression will be of type Integer as well. For Positive'Last (2 ** 31 - 1 with GNAT), your function result should be 30, and the sub-expression will overflow. (This is from a SPARK point of view; compilers are allowed to use larger ranged types when evaluating expressions to obtain the mathematically/logically correct result even if a sub-expression would overflow, and GNAT will do this for some values of -gnato.)

Sub-Question 4: 2 ** (X'Size - 1) can overflow. The reason has to do with the 2 meanings of 'Size: Positive'Size is the minimum number of bits needed to store a value of subtype Positive; X'Size is the actual number of bits allocated to X. Since you're using GNAT,

Integer'Last = Positive'Last = 2 ** 31 - 1. X'Size = 32. Positive'Size = 31.

So, 2 ** (X'Size - 1) = 2 ** 31 > Positive'Last. You probably want to use Positive'Size instead of X'Size.

(Again, from the SPARK point of view; compilers are allowed to obtain the logically correct result.)

Aside: the short-circuit forms and then and or else should only be used when they're actually needed. Modern processors do all sorts of optimizations at the machine-code level that have to be turned off for short-circuit evaluation. Although they may look like optimizations, in practice they are often the opposite.

HTH.

(You might want to tag this with [ada]. I only saw it because you referenced it in c.l.ada.)

like image 37
Jeffrey R. Carter Avatar answered Nov 14 '22 00:11

Jeffrey R. Carter