Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is difference between decode and decode' functions from aeson package?

Functions decode and decode' from aeson package are almost identical. But they have subtle difference described in documentation (posting only interesting part of docs here):

-- This function parses immediately, but defers conversion.  See
-- 'json' for details.
decode :: (FromJSON a) => L.ByteString -> Maybe a
decode = decodeWith jsonEOF fromJSON

-- This function parses and performs conversion immediately.  See
-- 'json'' for details.
decode' :: (FromJSON a) => L.ByteString -> Maybe a
decode' = decodeWith jsonEOF' fromJSON

I tried to read description of json and json' functions but still don't understand which one and when I should use because documentation is not clear enough. Can anybody describe more precisely the difference between two functions and provide some example with behavior explanation if possible?

UPDATE:

There are also decodeStrict and decodeStrict' functions. I'm not asking what is difference between decode' and decodeStrict for example which by the way is an interesting question as well. But what's lazy and what's strict here in all these functions is not obvious at all.

like image 883
Shersh Avatar asked Jul 28 '17 13:07

Shersh


2 Answers

The difference between these two is subtle. There is a difference, but it’s a little complicated. We can start by taking a look at the types.

The Value type

It’s important to note that the Value type that aeson provides has been strict for a very long time (specifically, since version 0.4.0.0). This means that there cannot be any thunks between a constructor of Value and its internal representation. This immediately means that Bool (and, of course, Null) must be completely evaluated once a Value is evaluated to WHNF.

Next, let’s consider String and Number. The String constructor contains a value of type strict Text, so there can’t be any laziness there, either. Similarly, the Number constructor contains a Scientific value, which is internally represented by two strict values. Both String and Number must also be completely evaluated once a Value is evaluated to WHNF.

We can now turn our attention to Object and Array, the only nontrivial datatypes that JSON provides. These are more interesting. Objects are represented in aeson by a lazy HashMap. Lazy HashMaps only evaluate their keys to WHNF, not their values, so the values could very well remain unevaluated thunks. Similarly, Arrays are Vectors, which are not strict in their values, either. Both of these sorts of Values can contain thunks.

With this in mind, we know that, once we have a Value, the only places that decode and decode' may differ is in the production of objects and arrays.

Observational differences

The next thing we can try is to actually evaluate some things in GHCi and see what happens. We’ll start with a bunch of imports and definitions:

:seti -XOverloadedStrings

import Control.Exception
import Control.Monad
import Data.Aeson
import Data.ByteString.Lazy (ByteString)
import Data.List (foldl')
import qualified Data.HashMap.Lazy as M
import qualified Data.Vector as V

:{
forceSpine :: [a] -> IO ()
forceSpine = evaluate . foldl' const ()
:}

Next, let’s actually parse some JSON:

let jsonDocument = "{ \"value\": [1, { \"value\": [2, 3] }] }" :: ByteString

let !parsed = decode jsonDocument :: Maybe Value
let !parsed' = decode' jsonDocument :: Maybe Value
force parsed
force parsed'

Now we have two bindings, parsed and parsed', one of which is parsed with decode and the other with decode'. They are forced to WHNF so we can at least see what they are, but we can use the :sprint command in GHCi to see how much of each value is actually evaluated:

ghci> :sprint parsed
parsed = Just _
ghci> :sprint parsed'
parsed' = Just
            (Object
               (unordered-containers-0.2.8.0:Data.HashMap.Base.Leaf
                  15939318180211476069 (Data.Text.Internal.Text _ 0 5)
                  (Array (Data.Vector.Vector 0 2 _))))

Would you look at that! The version parsed with decode is still unevaluated, but the one parsed with decode' has some data. This leads us to our first meaningful difference between the two: decode' forces its immediate result to WHNF, but decode defers it until it is needed.

Let’s look inside these values to see if we can’t find more differences. What happens once we evaluate those outer objects?

let (Just outerObjValue) = parsed
let (Just outerObjValue') = parsed'
force outerObjValue
force outerObjValue'

ghci> :sprint outerObjValue
outerObjValue = Object
                  (unordered-containers-0.2.8.0:Data.HashMap.Base.Leaf
                     15939318180211476069 (Data.Text.Internal.Text _ 0 5)
                     (Array (Data.Vector.Vector 0 2 _)))

ghci> :sprint outerObjValue'
outerObjValue' = Object
                   (unordered-containers-0.2.8.0:Data.HashMap.Base.Leaf
                      15939318180211476069 (Data.Text.Internal.Text _ 0 5)
                      (Array (Data.Vector.Vector 0 2 _)))

This is pretty obvious. We explicitly forced both of the objects, so they are now both evaluated to hash maps. The real question is whether or not their elements are evaluated.

let (Array outerArr) = outerObj M.! "value"
let (Array outerArr') = outerObj' M.! "value"
let outerArrLst = V.toList outerArr
let outerArrLst' = V.toList outerArr'

forceSpine outerArrLst
forceSpine outerArrLst'

ghci> :sprint outerArrLst
outerArrLst = [_,_]

ghci> :sprint outerArrLst'
outerArrLst' = [Number (Data.Scientific.Scientific 1 0),
                Object
                  (unordered-containers-0.2.8.0:Data.HashMap.Base.Leaf
                     15939318180211476069 (Data.Text.Internal.Text _ 0 5)
                     (Array (Data.Vector.Vector 0 2 _)))]

Another difference! For the array decoded with decode, the values are not forced, but the ones decoded with decode' are. As you can see, this means decode doesn’t actually perform conversion to Haskell values until they are actually needed, which is what the documentation means when it says it “defers conversion”.

Impact

Clearly, these two functions are slightly different, and clearly, decode' is stricter than decode. What’s the meaningful difference, though? When would you prefer one over the other?

Well, it’s worth mentioning that decode never does more work than decode', so decode is probably the right default. Of course, decode' will never do significantly more work than decode, either, since the entire JSON document needs to be parsed before any value can be produced. The only significant difference is that decode avoids allocating Values if only a small part of the JSON document is actually used.

Of course, laziness is not free, either. Being lazy means adding thunks, which can cost space and time. If all of the thunks are going to be evaluated, anyway, then decode is simply wasting memory and runtime adding useless indirection.

In this sense, the situations when you might want to use decode' are situations in which the whole Value structure is going to be forced, anyway, which is probably dependent on which FromJSON instance you’re using. In general, I wouldn’t worry about picking between them unless performance really matters and you’re decoding a lot of JSON or doing JSON decoding in a tight loop. In either case, you should benchmark. Choosing between decode and decode' is a very specific manual optimization, and I would not feel very confident that either would actually improve the runtime characteristics of my program without benchmarks.

like image 78
Alexis King Avatar answered Oct 20 '22 21:10

Alexis King


Haskell is a lazy language. When you call a function, it doesn't actually execute right then, but instead the information about the call is "remembered" and returned up the stack (this remembered call information is referred to as "thunk" in the docs), and the actual call only happens if somebody up the stack actually tires to do something with the returned value.

This is the default behavior, and this is how json and decode work. But there is a way to "cheat" the laziness and tell the compiler to execute code and evaluate values right then and there. And this is what json' and decode' do.

The tradeoff there is obvious: decode saves computation time in case you never actually do anything with the value, while decode' saves the necessity to "remember" the call information (the "thunk") at the cost of executing everything in place.

like image 31
Fyodor Soikin Avatar answered Oct 20 '22 21:10

Fyodor Soikin