Short version:
The function passed as the fourth argument to accumarray
sometimes gets called with arguments that are not consistent with specifications encoded the first argument to accumarray
.
As a result, functions used as arguments to accumarray
must test for what are, in effect, anomalous conditions.
The question is: how can an a 1-expression anonymous function test for such anomalous conditions? And more generally: how can write anonymous functions that are robust to accumarray
's undocumented behavior?
Full version:
The code below is a drastically distilled version of a problem that ate up most of my workday today.
First some definitions:
idxs = [1:3 1:3 1:3]';
vals0 = [1 4 6 3 5 7 6 Inf 2]';
vals1 = [1 Inf 6 3 5 7 6 4 2]';
anon = @(x) max(x(~isinf(x)));
Note vals1
is obtained from vals0
by swapping elements 2 and 8. The "anonymous" function anon
computes the maximum among the non-infinite elements of its input.
Given these definitions, the two calls below
accumarray(idxs, vals0, [], anon)
accumarray(idxs, vals1, [], anon)
which differ only in their second argument (vals0
vs vals1
), should produce identical results, since the difference between vals0
and vals1
affects only the ordering of the values in the argument to one of the calls to anon
, and the result of this function is insensitive to the ordering of elements in its argument.
As it turns out the first of these two expressions evaluates normally and produces the right result1:
>> accumarray(idxs, vals0, [], anon)
ans =
6
5
7
The second one, however, fails with:
>> accumarray(idxs, vals1, [], anon)
Error using accumarray
The function '@(x)max(x(~isinf(x)))' returned a non-scalar value.
To troubleshoot this problem, all I could come up with2 was to write a separate function (in its own file, of course, "the MATLAB way")
function out = kluge(x)
global ncalls;
ncalls = ncalls + 1;
y = ~isinf(x);
if any(y)
out = max(x(y));
else
{ncalls x}
out = NaN;
end
end
...and ran the following:
>> global ncalls;
>> ncalls = int8(0); accumarray(idxs, vals0, [], @kluge)
ans =
6
5
7
>> ncalls = int8(0); accumarray(idxs, vals1, [], @kluge)
ans =
[2] [Inf]
ans =
6
5
7
As one can see from the output of the last call to accumarray
above, the argument to the second call to the kluge
callback was the array [Int]
. This tells me beyond any doubt that accumarray
is not behaving as documented3 (since idxs
specifies no arrays of length 1 to be passed to accumarray
's function argument).
In fact, from this and other tests I determined that, contrary to what I expected, the function passed to accumarray
is called more than max(idxs)
(= 3) times; in the expressions involving kluge
above it's called 5 times.
The problem here is that if one cannot rely on how accumarray
's function argument will actually be called, then the only way to make this function argument robust is to include in it a lot of extra code to perform the necessary checks. This almost certainly will require that the function have multiple statements, which rules out anonymous functions. (E.g. the function kluge
above is robust more robust than anon
, but I don't know how to fit into an anonymous function.) Not being able to use anonymous functions with accumarray
greatly reduces its utility.
So my question is:
how to specify anonymous functions that can be robust arguments to
accumarray
?
1 I have removed blank lines from MATLAB's typical over-padding in all the MATLAB output shown in this post.
2 I welcome comments with any other troubleshooting suggestions you may have; troubleshooting this problem was a lot harder than it should be.
3
In particular, see items number 1 through 5 right after the line "The function processes the input as follows:".
The fourth input argument of accumarray
, anon
in this case, must return a scalar for any input.
Consider the output when the indexes are sorted:
>> [idxsSorted,sortInds] = sort(idxs)
>> accumarray(idxsSorted, vals0(sortInds), [], anon)
ans =
6
5
7
>> accumarray(idxsSorted, vals1(sortInds), [], anon)
ans =
6
5
7
Now, all the documentation has to say about this is the following:
If the subscripts in subs are not sorted, fun should not depend on the order of the values in its input data.
How does this relate the trouble with anon
? It is a clue, as this forces anon
to be called for the complete set of values for a given idx
rather than a subset/subarray, as Luis Mendo suggested.
Consider how accumarray
would work for a non-sorted list of indexes and values:
>> [idxs vals0 vals1]
ans =
1 1 1
2 4 Inf
3 6 6
1 3 3
2 5 5
3 7 7
1 6 6
2 Inf 4
3 2 2
For both vals0
and vals1
, the Inf
belongs to the set where idxs
equals 2. Since idxs
is not sorted, it does not process all values for idxs=2
in one shot, at first. The actual algorithm (implementation) is opaque, but it seems to start by assuming that idxs
is sorted, processing each single-valued block of the first argument. This is verifiable by putting a breakpoint in fun
, the function reference by fourth input argument. When it encounters a 1 in idxs
for the second time, it seems to start over, but with subsequent calls to fun
containing all the values for a given index. Presumably accumarray
calls some implementation of unique
to fully-segment idxs
(incidentally, order is not preserved). As kjo suggests, this is the point where accumarray
actually processes the inputs as described in the documentation, following steps 1-5 here ("Find out how many unique indices there are..."). As a result, it crashes for vals1
, when anon(Inf)
is called, but not for vals0
, which instead calls anon(4)
on the first try.
However, even if it followed those steps exactly on the first go, it would not necessarily be robust if a complete subarray of values contained just Inf
s (consider that anon([Inf Inf Inf]
) returns an empty matrix too). It is a requirement, although an understated one, that fun
must return a scalar. What is not clear from the documentation is that it must return a scalar, for any inputs, not just what is expected based on the high-level description of the algorithm.
Workaround:
anon = @(x) max([x(~isinf(x));-Inf]);
The documentation does not say that anon
is called only with the whole set1 of vals
corresponding to each value of idx
as its input. As seen in your example, it does get called with subsets thereof.
So the way to make anon
robust seems to be: make sure it gives a scalar output when its input is any subset of vals
(or maybe just any subset of each set with same-idx
value). In your case, anon(inf)
does not return a scalar.
1 It's actually an array, of course, but I think it's easier to describe this in terms of sets (and subsets).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With