Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding side effects: Is allocating memory a pure operation?

I am learning about side-effects in functional programming (in Haskell) and I know that external effects are effects that are observable outside the function, whereas internal effects are not visible from the outside.

Is allocating memory for a data structure a pure operation? A side-effect has to modify some state or have an observable interaction with calling functions or the outside world. In the case of allocating a data structure, you would have to call some functions (e.g. malloc) to allocate memory for it. But if these functions were called within a function, this wouldn't be observable to the outside world. Even though the outside world is modified (as there is memory allocated for the data structure), I don't think that allocating a data structure is a side-effect since it's not observable.

However, I am not sure whether my reasoning is correct. Any insights are appreciated.

like image 628
ceno980 Avatar asked Mar 03 '23 16:03

ceno980


1 Answers

Is allocating memory for a data structure a pure operation?

In Haskell, memory is almost never allocated directly by the programmer. In this sense, the question is moot: allocating memory is neither a pure nor impure operation, because allocating memory is not an operation, but rather, it is an implementation detail. Put another way, memory allocations in Haskell are not observable to external code (or any code), but that's not because of purity, it's because the language itself abstracts over the concept of memory allocation. As far as the Haskell code itself is concerned, there's no such thing as memory, or memory allocation.

The reason this is important is because it allows the compiler to make all kinds of optimizations without changing the meaning of the code. For example, in Haskell, when you make small changes to a large data structure, you're actually making a copy of the structure instead of modifying the original, and this is extremely inefficient. However, the compiler can usually tell if old copies of the structure are needed, and if not, it will write machine code that simply modifies the original structure. As another example, small local values may be moved to CPU registers or the system stack, bypassing allocation entirely for those values. As long as it doesn't change what the code does, the optimizer can and will break the purity rule by any means necessary. The distinction is simply unimportant at that level.


In the case of allocating a data structure, you would have to call some functions (e.g. malloc) to allocate memory for it. But if these functions were called within a function, this wouldn't be observable to the outside world. Even though the outside world is modified (as there is memory allocated for the data structure), I don't think that allocating a data structure is a side-effect since it's not observable.

There are situations where a memory allocation operation may be exposed to Haskell code. For example, a C function that allocates memory with malloc might be called by a Haskell binding via the FFI. In these situations, the author of the binding needs to decide whether the function is "pure" (the type should return a pure value), or "impure" (the type should return an IO action). This is the main situation I can think of where an answer to this question is pragmatically valuable.

In this context, the important things to look at are:

  • If I run the function with the same inputs in any situation, will it always give me the same outputs?
  • If I replace the function call in the source code with its resulting value, will the program have exactly the same behavior?

If both of the answers are "yes", it's a pure function, otherwise it's impure. This is regardless of how much impurity is happening in the C code, as long as that impurity is not visible outside of the function it's fine.


So to actually answer your question: It depends.

Say malloc is called, and then the memory is freed before the function exits. Over the course of the function, zero net memory has been allocated. So the function is pure.

Say malloc is called, and a pointer into the allocated memory is returned. This is not pure, because Haskell is only aware of the pointer, not the allocated memory itself. If we run this function to allocate a 4-byte chunk, and then we run it again to allocate another 4-byte chunk, and Haskell thinks the function is pure, it may replace the second call with the result (pointer) of the first call, causing both calls to return pointers to the same 4-byte chunk (which is not what you want). Because of this, the function must be typed as an impure IO action.

like image 112
DarthFennec Avatar answered Apr 30 '23 10:04

DarthFennec