I am currently working on creating my own persistent array in java, which uses a binary search tree to store an collection of values.
I want to add a map method that takes a Function
as an argument to generate a new array. I don't want to evaluate the functions unless the particular value is requested. This is pretty simple to do as lambdas are lazy evaluated. However, I also only want a function to be evaluated only once, even if the result is requested multiple times.
I could create a node that stores the supplier and updates the result when evaluated:
class Node<T> {
private T value;
private Supplier<T> supplier;
public T get() {
if (null != value)
return value;
value = supplier.get();
return value;
}
}
...where supplier
is derived from a Function
being applied to a value in an older version of the persisted array.
However, this is no longer a functional approach, and has the potential to cause errors in a multi-threaded system*. It also doesn't yield an advantage in the case where supplier returns a null value**.
Another approach would be return an instance of Node
on a get call:
class Node<T> {
private final Optional<T> value;
private final Supplier<T> supplier;
Node(Supplier<T> supplier, T value) {
this.supplier = supplier;
this.value = Optional.ofNullable(value);
}
public Tuple<Node<T>, T> get() {
if (null != value)
return new Tuple<>(this, value.orElse(null));
T result = supplier.get();
Node<T> newNode = new Node<>(null, result);
return new Tuple<>(newNode, result);
}
}
I like this approach for staying functional; but it would require a lot of overhead in all the parent nodes going up the tree for just a get call. And it would require cumbersome unboxing in the using application's code.
Does anyone have another approach they can think of to make this work the way I'm asking? Thanks.
*This could be solved using locking mechanisms, but adds a layer of complexity I'm hoping to avoid.
**I've thought about making value
an Optional<T>
, where null
means hasn't been evaluate, and Optional.empty()
as has been evaluated and returns a null. However, this works around my issue, rather than solving it.
For anyone not familiar with a persisted array, it's a data structure that creates a new instance of itself whenever an update is performed. This allows it to be purely immutable. Using a binary tree (or more common 32-bit tree) method allows the updates to reduce duplicating data, both in speed and memory.
EDIT:
The code for the collection can be found on github. For description of use, you can look in the test folder.
Lambdas allows Java to be lazy, as they represent a function to be executed that can be passed around and will only be evaluated when required. The examples below are pretty simple but should demonstrate the order of execution, showing that the lambdas are not executed straight away.
Lazy evaluation in Java. Lazy evaluation is another functional programming technique that is not new to Java 8. This technique delays the evaluation of an expression until its value is needed. In most cases, Java eagerly evaluates an expression that is bound to a variable.
Lazy Evaluation using Python The range method in Python follows the concept of Lazy Evaluation. It saves the execution time for larger ranges and we never require all the values at a time, so it saves memory consumption as well.
algorithm - Java lambdas 20 times slower than anonymous classes - Stack Overflow. Stack Overflow for Teams – Start collaborating and sharing organizational knowledge.
Disclaimer: this answer doesn't respond the question directly, as it doesn't use neither Supplier
nor Optional
directly in the Node
class. Instead, a generic functional programming technique is presented that might help solve the problem.
If the problem is all about evaluating the function only once for each input value, then you shouldn't change your tree/array/nodes. Instead, memoize the function, which is a pure functional approach:
In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again
Here's a way to do it, inspired by this excellent article written by Pierre-Yves Saumont (please check it for an in-depth introduction to memoization):
public static <T, U> Function<T, U> memoize(Function<T, U> function) {
Map<T, U> cache = new ConcurrentHashMap<>();
return input -> cache.computeIfAbsent(input, function::apply);
}
Suppose you have a method that takes quite long to execute. Then, you could use the memoize
method this way:
// This method takes quite long to execute
Integer longCalculation(Integer x) {
try {
Thread.sleep(1_000);
} catch (InterruptedException ignored) {
}
return x * 2;
}
// Our function is a method reference to the method above
Function<Integer, Integer> function = this::longCalculation;
// Now we memoize the function declared above
Function<Integer, Integer> memoized = memoize(function);
Now, if you call:
int result1 = function.apply(1);
int result2 = function.apply(2);
int result3 = function.apply(3);
int result4 = function.apply(2);
int result5 = function.apply(1);
You'll notice that the five calls take ~5 seconds altogether (1 second for each call).
However, if you use the memoized
function with the same input values 1 2 3 2 1
:
int memoizedResult1 = memoized.apply(1);
int memoizedResult2 = memoized.apply(2);
int memoizedResult3 = memoized.apply(3);
int memoizedResult4 = memoized.apply(2); // <-- returned from cache
int memoizedResult5 = memoized.apply(1); // <-- returned from cache
You'll notice that now the five calls take ~3 seconds altogether. This is because the last two results are immediately returned from the cache.
So, back to your structure... Inside your map
method, you could just memoize the given function and use the returned memoized function instead. Internally, this will cache the function's return values in the ConcurrentHashMap
.
As the memoize
method uses a ConcurrentHashMap
internally, you don't need to worry about concurrency.
Note: This is just the beginning... I'm thinking about two possible improvements here. One would be to limit the size of the cache, so that it doesn't take the whole memory if the domain of the given function is too big. The other improvement would be to memoize the given function only if it hasn't been memoized previously. But these details are left as an exercise for the reader... ;)
I also only want a function to be evaluated only once, even if the result is requested multiple times.
How about this?
class Node<T> {
private Supplier<T> supplier;
Node(T value, Supplier<T> supplier) {
this.supplier = sync(lazy(value, supplier));
}
public T get() {
return supplier.get();
}
}
the sync
method only synchronizing the Supplier
once, when the target
is called, the lock
is disabled for the next continuous requests:
static <T> Supplier<T> sync(Supplier<T> target) {
return sync(new ReentrantLock(), target);
}
static <T> Supplier<T> sync(ReentrantLock lock, Supplier<T> target) {
// v--- synchronizing for multi-threads once
return lazy(() -> {
// the interesting thing is that when more than one threads come in here
// but target.delegate is only changed once
lock.lock();
try {
return target.get();
} finally {
lock.unlock();
}
});
}
the lazy
method calls a given Supplier
only once as below:
static <T> Supplier<T> lazy(T value, Supplier<T> defaults) {
return lazy(() -> value != null ? value : defaults.get());
}
static <T> Supplier<T> lazy(Supplier<T> target) {
return new Supplier<T>() {
private volatile Supplier<T> delegate = () -> {
T it = target.get();
//v--- return the evaluated value in turn
delegate = () -> it;
return it;
};
@Override
public T get() {
return delegate.get();
}
};
}
IF you interested in how the final code was made, I have commit the code into github, you can just copy and use it. and you can found I have renamed lazy
methods to once
, which is more expressiveness.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With