Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to impurely modify a state associated with an object?

In Haskell, I have a container like:

data Container a = Container { length :: Int, buffer :: Unboxed.Vector (Int,a) }

This container is a flattened tree. Its accessor (!) performs a binary (log(N)) search through the vector in order to find the right bucket where index is stored.

(!) :: Container a -> Int -> a
container ! index = ... binary search ...

Since consecutive accesses are likely to be in the same bucket, this could be optimized in the following way:

if `index` is on the the last accessed bucket, skip the search

The tricky point is the last accessed bucket part. In JavaScript, I'd just impurely modify a hidden variable on the container object.

function read(index,object){

    var lastBucket = object.__lastBucket;

    // if the last bucket contains index, no need to search
    if (contains(object, lastBucket, index))
        var bucket = lastBucket;

    // if it doesn't
    else {
        // then we search the bucket
        var bucket = searchBucket(index,object);

        // And impurely annotate it on the container, so the
        // next time we access it we could skip the search.
        container.__lastBucket = bucket;
    }

    return object.buffer[bucket].value;
}

Since this is just an optimization and the result is the same independent of the branch taken, I believe it doesn't break referential transparency. How is it possible, in Haskell, to impurely modify an state associated with a runtime value?

~

I have thought in 2 possible solutions.

  1. A global, mutable hashmap linking pointers to the lastBucket value, and use unsafePerformIO to write on it. But I'd need a way to get the runtime pointer of an object, or at least an unique id of some sort (how?).

  2. Add an extra field to Container, lastBucket :: Int, and somehow impurely modify it within (!), and consider that field internal (because it obviously break referential transparency).

like image 645
MaiaVictor Avatar asked Sep 26 '22 22:09

MaiaVictor


People also ask

Is an example of impure function?

First, an impure function is a function that contains one or more side effects. In the snippet above, updateMyName() is an impure function because it contains code ( myNames ) that mutates an external state — which gives updateMyName() some side effects.

What is an impure function in programming?

An impure function is a function that mutates variables/state/data outside of it's lexical scope, thus deeming it “impure” for this reason. There are many ways to write JavaScript, and thinking in terms of impure/pure functions we can write code that is much easier to reason with.

What is impure function in Java?

An impure function is a function that contains one or more side effects. It mutates data outside of its lexical scope and does not predictably produce the same output for the same input.

What do you mean by pure and impure function?

An impure function is a function that contains one or more side effects. A pure function is a function without any side effects.


2 Answers

Using solution (1), I managed to get the following design. First, I added a __lastAccessedBucket :: IORef Int field to my datatype, as suggested by @Xicò:

data Container a = Container { 
    length :: Int, 
    buffer :: V.Vector (Int,a), 
    __lastAccessedBucket :: IORef Int }

Then, I had to update the functions that create a new Container in order to create a new IORef using unsafePerformIO:

fromList :: [a] -> Container a
fromList list = unsafePerformIO $ do
    ref <- newIORef 0
    return $ Container (L.length list) buffer ref
    where buffer = V.fromList (prepare list)

Finally, I created two new functions, findBucketWithHint, a pure function which searches the bucket of an index with guess (i.e., the bucket where you think it might be), and the unsafeFindBucket function, which replaces the pure findBucket when performance is needed, by always using the last accessed bucket as the hint:

unsafeFindBucket :: Int -> Container a -> Int
unsafeFindBucket findIdx container = unsafePerformIO $ do 
    let lastBucketRef = __lastAccessedBucket contianer
    lastBucket       <- readIORef lastBucketRef
    let newBucket     = findBucketWithHint lastBucket findIdx container
    writeIORef lastBucketRef newBucket
    return $ newBucket

With this, unsafeFindBucket is technically a pure function with the same API of the original findBucket function, but is an order of magnitude faster in some benchmarks. I have no idea how safe this is and where it could cause bugs. Threads are certainly a concern.

like image 117
MaiaVictor Avatar answered Sep 29 '22 12:09

MaiaVictor


(This is more an extended comment than an answer.)

First I'd suggest to check if this isn't a case of premature optimization. After all, O(log n) ins't that bad.

If this part is indeed performance-critical, your intention is definitely valid. The usual warning for unsafePerformIO is "use it only if you know what you're doing", which you obviously do, and it can help to make things pure and fast at the same time. Be sure that you follow all the precautions in the docs, in particular setting the proper compiler flags (you might want to use the OPTIONS_GHC pragma).

Also make sure that the IO operation is thread safe. The easiest way to ensure that is to use IORef together with atomicModifyIORef.

The disadvantage of an internal mutable state is that the performance of the cache will deteriorate if it's accessed from multiple threads, if they lookup different elements.

One remedy would be to explicitly thread the updated state instead of using the internal mutable state. This is obviously what you want to avoid, but if your program is using monads, you could just add another monadic layer that'd internally keep the state for you and expose the lookup operation as a monadic action.

Finally, you could consider using splay trees instead of the array. You'd still have (amortized) O(log n) complexity, but their big advantage is that by design they move frequently accessed elements near the top. So if you'll be accessing a subset of elements of size k, they'll be soon moved to the top, so the lookup operations will be just O(log k) (constant for a single, repeatedly accessed element). Again, they update the structure on lookups, but you could use the same approach with unsafePerformIO and atomic updates of IORef to keep the outer interface pure.

like image 38
Petr Avatar answered Sep 29 '22 11:09

Petr