Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overriding virtual boolean pure method without LSP breaking

For example we have the following structure:

class Base
{
    [pure]
    public virtual bool IsValid(/*you can add some parameters here*/)
    {
       //body
    }
}

class Child : Base
{
    public override bool IsValid(/*you can add some parameters here*/)
    {
       //body
    }
}

Could you please fill Base::IsValid() and Child::IsValid() with different bodies but without conflicts with LSP? Let's imagine it's just method for analyzing, we cannot change instance's state. Are we able to do it? I'm interested in any example. I try to understand whether virtual (bodied) boolean methods is anti-pattern or not in general case.

like image 429
Serg046 Avatar asked Apr 26 '18 15:04

Serg046


3 Answers

First, your answer:

class Base
{
    [pure]
    public virtual bool IsValid()
    {
       return false;
    }
}

class Child : Base
{
    public override bool IsValid()
    {
       return true;
    }
}

Basically, the LSP says (it's a definition of "subtype"):

If for each object o1 of type S there is an object o2 of type T such that for all programs P defined in terms of T,the behavior of P is unchanged when o1 is substituted for o2, then S is a subtype of T. (Liskov, 1987)

"But I can't replace an o1 of type Base by any o2 of type Child, because they obviously behave differently!" To address this remark, we have to make a detour.

What is a subtype?

First, note that Liskov is not only talking of classes but also of types. Classes are implementations of types. There are good and bad implementations of types. We'll try to distinguish them, especially when it comes to subtypes.

The question behind the Liskov Substitution Principle is: what is a subtype? Usually, we assume that a subtype is a specialization of its supertype and an extension of its capabilities:

> The intuitive idea of a subtype is one whose objects provide all the behavior of objects of another type (the supertype) plus something extra (Liskov, 1987)

On the other hand, most compilers assume that a subtype is a class that has at least the same methods (same name, same signature including covariance and exceptions), either inherited or redefined (or defined for the first time) and a label (inherits, extends, ...).

But these critieria are incomplete and leads to mistakes. Here are two infamous examples:

  • SortedList is (?) a subtype of List: it represents a list that is sorted (specialization).
  • Square is (?) a subtype of Rectangle: it represents a rectangle with four equal sides (specialization).

Why a SortedList is not a List? Because of the semantic of the List type. A type is not only a collection of signatures, the methods also have a semantic. By semantic, I mean all the authorized uses of an object (remember Wittgenstein: "the meaning of a word is its use in the language"). For instance, you expected to find a element at the position where you put it. But if the list is always sorted, a newly inserted element will be moved at its "right" place. Thus, you won't find this element at the position where you put it.

Why a Square is not a Rectangle? Imagine you have a method set_width: with a square, you have to change the height too. But the semantic of set_width is that is changes the width but leaves the height unmodified.

(A square is not a rectangle? This question leads sometimes to heated discussion, hence I will elaborate on the subject. We all have learned that a square is a rectangle. But this is true in the sky of pure mathematics, where objects are immutable. If you define an ImmutableRectangle (with fixed width, height, position, angle and computed perimeter, area, ...), then ImmutableSquare will be a subtype of ImmutableRectangle according to the LSP. At first glance, such immutable classes do not seem very useful, but there's a way to deal with this: replace setters by methods that will create a new object, as you would do in any functional language. E.g. ImmutableSquare.copyWithNewHeight(h) will return a new... ImmutableRectangle whose height is h and width is the size of the square.)

We can use the LSP to avoid those mistakes.

Why do we need the LSP?

But why, in practice, do we need to care about the LSP? Because compilers do not capture the semantic of a class. You may have a subclass that is not the implementation of a subtype.

For Liskov (and Wing, 1999), a type specification includes:

  • The type's name
  • A description of the type's value space
  • A definition of the type's invariant and history properties;
  • For each of the type's method:
    • Its name;
    • Its signature (including signaled exceptions);
    • Its behavior in terms of pre-conditions and post-conditions

If a compiler was able to enforce those specification for every class, it would be able (at compile time or runtime, depending on the nature of the specification) to tell us: "hey, this is not a subtype!".

(Actually, there is a programming language that tries to capture the semantic: Eiffel. In Eiffel, the invariants, the pre-conditions and the post-conditions are essential parts of the definition of a class. Therefore, you don't have to care about the LSP: the runtime will do it for you. That would be nice, but Eiffel has also limitations. This language (any language?) wont't be enough expressive to define the full semantic of isValid(), because this semantic is not contained in a pre/post condition or an invariant.)

Now, back to the example. Here, the only indication we have on the semantic of isValid is the name of the method: it should return true if the object is valid, and false otherwise. You obviously need the context (and perhaps detailed specs or domain knowledge) to know what is and what is not valid.

Actually, I can imagine dozen of situations where any object of type Base is valid, but all objects of type Child are invalid (see the code at the top of the answer). E.g. replace Base by Passport and Child by FakePassword (assuming that a fake password is a password...).

Thus, even if the Base class says: "I'm valid", the Base type says: "Almost all of my instances are valid, but those who are invalid should say it!" That's why you have a Child class implementing the Base type (and deriving the Base class)that says: "I'm not valid".

A more interesting example

But I think the example you have chosen is not the best for checking pre/post conditions and invariants: since the function is pure, it may not, by design, break any invariant; since the return value is a boolean (2 values) there's no interesting post-condition. The only thing you can have is an interesting pre condition if you have some parameters.

Let's take a more interesting example: a collection. In pseudo-code, you have:

abstract class Collection {
    abstract iterator(); // returns a modifiable iterator
    abstract size();

    // a generic way to set a value
    set(i, x) {
        [ precondition: 
            size: 0 <= i < size() ]

        it = iterator()
        for i=0 to i:
            it.next()
        it.set(x)

        [ postcondition:
            no_size_modification: size() = old size()
            no_element_modification_except_i: for all j != i, get(j) == old get(j)
            was_set: get(i) == x ]
    }

    // a generic way to get a value
    get(i) {
        [ precondition:
            size: 0 <= i < size() ]

        it = iterator()
        for i=0 to i:
            it.next()
        return it.get()

        [ postcondition:
            no_size_modification: size() = old size()
            no_element_modification: for all j, get(j) == old get(j) ]
    }

    // other methods: remove, add, filter, ...

    [ invariant: size_positive: size() >= 0 ]
}

This collection has some abstract methods, but the set and get methods are already bodied methods. Furthermore, we can say that they are okay for a linked list, but not for a list backed by an array. Let's try to create a better implementation for a random access collection:

class RandomAccessCollection {
    // all pre/post conditions and invariants are inherited from Collection.

    // fields:
    // self.count = number of elements.
    // self.data = the array.

    iterator() { ... }
    size() { return self.count; }

    set(i, x) { self.data[i] = x }

    get(i) { return self.data[i] }

    // other methods
}

It's obvious that the semantic of get and set in RandomAccessCollection are compliant with the definitions of the Collection class. In particular, all pre/post conditions and the invariant are met. In other words, the conditions of the LSP are met and thus, the LSP is respected: we can replace, in every program, any object of type Collection by an analog object of type RandomAccesCollection without breaking the behavior of the programs.

Conclusion

As you see, it is easier to respect the LSP than to break it. But sometimes we break it (e.g. try to make a SortedRandomAccessCollection that inherits of RandomAccessCollection). The crystal clear formulation of the LSP helps us to narrow what went wrong and what should be made to correct the design.

More generally, virtual (bodied) boolean methods is not an anti-pattern if the base class has enough flesh to implement the methods. But if the base class is so abstract that every subclass wil have to redefine the methods, then leave the methods abstract.

References

There are two main original papers from Liskov: Data Abstraction and Hierarchy (1987) and Behavioral Subtyping Using Invariants and Constraints (1994, 1999, with J. M. Wing). Note that these are theoretical papers.

like image 57
jferard Avatar answered Nov 17 '22 06:11

jferard


The idea of LSP does not forbid polymorphism of child classes. Rather, it emphasizes what is allowed to be changed and what is not. In general, it means that:

  1. Any overriding function accepts and returns the same types of the overridden function; that includes possibly thrown exceptions (input types might extend those of the overridden and output types might narrow them - this will still keep that restriction).
  2. The "History Rule" - the "Base" part of the Child object must not be changed by the Child's function to a state that could never be reached using the Base class functions. Thus, a function that expects a Base object will never get unexpected results.
  3. Invariants of the Base must not be changed in the Child. That is, any general assumption about the Base class behavior must be kept by the Child.

The two first bullets are very well defined. The "invariants" are more a matter of sense. For example, if some class in a real time environment requires that all its functions run under some constant time, all the overriding functions in its subtypes must also adhere to that requirement.

In your case, IsValid() means something, and that "something" must be kept under all child types. For example, let's assume that your Base class defines a product, and IsValid() tells whether that product is valid to be sold. What exactly makes each product valid might differ. For example, it must have its price set in order to be valid for sell. But the Child product must also go through an electricity test before it might be sold.

In this example, we keep all the requirements:

  1. The input and output types of the function are not changed.
  2. The state of the Base-part of the Child object is not changed in a way that could not be expected by the Base class.
  3. The invariants of the class are kept: a Child object without a price can still not be sold; the meaning of not being valid is still the same (not being allowed for sell), it is just being calculated in a way that matches the Child.

You can get some more explanations here.

===

Edit - some additional explanations according to the notes

The whole idea of polymorphism is that the same function is being done differently by each subtype. LSP does not violates polymorphism but describes what polymorphism should take care of. In particular, LSP requires that any subtype Child can be used where the code requires Base and that any assumption done for Base holds for any of his Childs. In the above example, IsValis() does not mean "has a price". Rather, it means exactly that: Is the product valid? In some cases, having a price is enough. In other, it also requires electricity checks, and yet in others it might require some other properties as well. If the designer of the Base class does not require that by setting price a product becomes valid, but rather leaves IsValid() as a separate test, then no violation of the LSP takes place. What example would have made that violation? An example where one ask an object if it IsValid(), then calls a function of the base class that should not change the validity, and that function changes Child to not being valid anymore. This is a violation of the History Rule of LSP. The known example, provided by others here, is square as a child of rectangle. But so long as the same sequence of function calls does not require a specific behavior (again - it is not defined that setting a price makes a product valid; it just happens to be that way in some types) - the LSP is held as required.

like image 40
Mike Avatar answered Nov 17 '22 07:11

Mike


The basic idea behind LSP is not to hinder the capability to Override the method of Base class but to avoid changing internal state of Base class (changing data members of base class class) in such a way that Base class won't have.

It simply states: Any type (class) that inherits another type, must be substitutive to that type, so that if Child class inherits Base class, then anywhere in the code where an object of Base class is expected, we can provide a Child class object without changing the system behavior.

It doesn't however obstruct us from modifying the members of Child class. The famous example which violates this example is Square/Rectangle problem. You can find the details of example here.

In your case, since you are just analyzing some data in IsValid() and not modifying the internal state of Base class, there shouldn't be any violation of LSP.

like image 2
Sagar Avatar answered Nov 17 '22 06:11

Sagar