I have a (large) cell array, with various data types. For example,
myCell = { 1, 2, 3, 'test', 1 , 'abc';
4, 5, 6, 'foob', 'a', 'def' };
This can include more obscure types like java.awt.Color
objects.
I want to ensure that the data in each column is of the same type, since I want to perform table-like operations on it. However, this process seems very slow!
My current method is to use cellfun
to get the classes, and strcmp
to check them
% Get class of every cell element
types = cellfun( @class, myCell, 'uni', false );
% Check that they are consistent for each column
typesOK = all( strcmp(repmat(types(1,:), size(types,1), 1), types), 1 );
% Output the types (mixed type columns can be handled using typesOK)
types = types(1, :);
% Output for the above example:
% >> typesOK = [1 1 1 1 0 1]
% >> types = {'double', 'double', 'double', 'char', 'double', 'char'}
I had thought to use cell2table
, since it does type checking for the same reason. However, it doesn't give me the desired result (which columns are which types, strictly).
Is there a quicker way to check type consistency within a cell array's columns?
Edit: I've just done some profiling...
It appears the types = cellfun( @class, ...)
line takes over 90% of the processing time. If your method is only subtly different to mine, it should be that line which changes, the strcmp
is pretty quick.
Edit: I was fortunate to have many suggestions for this problem, and I have compiled them all into a benchmarking answer for performance tests.
To be tested if it can be faster for very large arrays but maybe something like this:
function [b] = IsTypeConsistentColumns(myCell)
%[
b = true;
try
for ci = 1:size(myCell, 2)
cell2mat(myCell(:, ci));
end
catch err
if (strcmpi(err.identifier, 'MATLAB:cell2mat:MixedDataTypes'))
b = false;
else
rethrow(err);
end
end
%]
end
It depends on how fast cell2mat
is compared to your string comparison (even is result of cell2mat
is not used here.
Note that cell2mat
will throw an error if type is not consistent (identifier: 'MATLAB:cell2mat:MixedDataTypes'
, message = 'All contents of the input cell array must be of the same data type.'
)
EDIT: limiting to cellfun('isclass', c , cellclass) test
Here only using type consistence check that is internally performed in cell2mat
routine:
function [consistences, types] = IsTypeConsistentColumns(myCell)
%[
ncols = size(myCell, 2);
consistences = false(1, ncols);
types = cell(1, ncols);
for ci = 1:ncols
cellclass = class(myCell{1, ci});
ciscellclass = cellfun('isclass', myCell(:, ci), cellclass);
consistences(ci) = all(ciscellclass);
types{ci} = cellclass;
end
%]
end
With you test case myCell = repmat( { 1, 2, 3, 'test', 1 , 'abc'; 4, 5, 6, 'foob', 'a', 'def' }, 10000, 5 );
,
It takes about 0.0123 seconds on my computer with R2015b ... It could even be faster if you want to fail on first non consistent column (here I'm testing them all)
This is a collection of the different suggestions with a benchmarking script to compare timings...
function benchie
% Create a large, mixed type cell array
myCell = repmat( { 1, 2, 3, 'test', 1 , 'abc';
4, 5, 6, 'foob', 'a', 'def' }, 10000, 5 );
% Create anonymous functions for TIMEIT
f1 = @() usingStrcmp(myCell);
f2 = @() usingUnique(myCell);
f3 = @() usingLoops(myCell);
f4 = @() usingISA(myCell);
f5 = @() usingIsClass(myCell);
% Timing of different methods
timeit(f1)
timeit(f2)
timeit(f3)
timeit(f4)
timeit(f5)
end
function usingStrcmp(myCell)
% The original method
types = cellfun( @class, myCell, 'uni', false );
typesOK = all( strcmp(repmat(types(1,:), size(types,1), 1), types), 1 );
types = types(1, :);
end
function usingUnique(myCell)
% Using UNIQUE instead of STRCMP, as suggested by rahnema1
types = cellfun( @class, myCell, 'uni', false );
[type,~,idx]=unique(types);
u = unique(reshape(idx,size(types)),'rows');
if size(u,1) == 1
% consistent
else
% not-consistent
end
end
function usingLoops(myCell)
% Using loops instead of CELLFUN. Move onto the next column if a type
% difference is found, otherwise continue looping down the rows
types = cellfun( @class, myCell(1,:), 'uni', false );
typesOK = true(size(types));
for c = 1:size(myCell,2)
for r = 1:size(myCell,1)
if ~strcmp( class(myCell{r,c}), types{c} )
typesOK(c) = false;
continue
end
end
end
end
function usingISA(myCell)
% Using ISA instead of converting all types to strings. Suggested by Sam
types = cellfun( @class, myCell(1,:), 'uni', false );
for ii = 1:numel(types)
typesOK(ii) = all(cellfun(@(x)isa(x,types{ii}), myCell(:,ii)));
end
end
function usingIsClass(myCell)
% using the same method as found in CELL2MAT. Suggested by CitizenInsane
ncols = size(myCell, 2);
typesOK = false(1, ncols);
types = cell(1, ncols);
for ci = 1:ncols
cellclass = class(myCell{1, ci});
ciscellclass = cellfun('isclass', myCell(:, ci), cellclass);
typesOK(ci) = all(ciscellclass);
types{ci} = cellclass;
end
end
Outputs:
Tested on R2015b
usingStrcmp: 0.8523 secs
usingUnique: 1.2976 secs
usingLoops: 1.4796 secs
usingISA: 10.2670 secs
usingIsClass: 0.0131 secs % RAPID!
Tested on R2017b
usingStrcmp: 0.8282 secs
usingUnique: 1.2128 secs
usingLoops: 0.4763 secs % ZOOOOM! (Relative to R2015b)
usingISA: 9.6516 secs
usingIsClass: 0.0093 secs % RAPID!
The looping method will depend heavily on where the type discrepancy occurs, since it could loop over every row of every column or just 2 rows of every column.
With the same inputs though (as shown), the looping has been massively optimised in the newer version of MATLAB (2017b), saving >65% time, and 50% quicker than the original!
Conclusions:
For top speed on newer MATLAB releases, the looping method may be optimal.
Update: The method proposed by CitizenInsane is extremely quick compared to other versions, and is likely hard to beat since it uses the same methodology found in Matlab's own cell2mat
.
Recommendation: use the above usingIsClass
function.
You can use unique
:
myCell = { 1, 2, 3, 'test', 1 , 'abc';
4, 5, 6, 'foob', 'a', 'def' };
types = cellfun( @class, myCell, 'uni', false );
[type,~,idx]=unique(types);
u = unique(reshape(idx,size(types)),'rows');
if size(u,1) == 1
disp('consistent')
else
disp('non-consistent')
end
How about this:
>> myCell = { 1, 2, 3, 'test', 1 , 'abc';
4, 5, 6, 'foob', 'a', 'def' }
myCell =
2×6 cell array
[1] [2] [3] 'test' [1] 'abc'
[4] [5] [6] 'foob' 'a' 'def'
>> firstRowTypes = cellfun(@class, myCell(1,:), 'uni', false)
firstRowTypes =
1×6 cell array
'double' 'double' 'double' 'char' 'double' 'char'
>> for i = 1:numel(firstRowTypes)
typesOK(i) = all(cellfun(@(x)isa(x,firstRowTypes{i}), myCell(:,i)));
end
>> typesOK
typesOK =
1×6 logical array
1 1 1 1 0 1
I haven't done extensive timings, but I think that should speed things up (at least for large cell arrays), as
isa
, rather than converting all the types to strings and then comparing strings.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With