Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to get class types of elements of a cell array

I have a (large) cell array, with various data types. For example,

 myCell = { 1, 2, 3, 'test',  1 , 'abc';
            4, 5, 6, 'foob', 'a', 'def' };

This can include more obscure types like java.awt.Color objects.

I want to ensure that the data in each column is of the same type, since I want to perform table-like operations on it. However, this process seems very slow!

My current method is to use cellfun to get the classes, and strcmp to check them

% Get class of every cell element
types = cellfun( @class, myCell, 'uni', false );
% Check that they are consistent for each column
typesOK = all( strcmp(repmat(types(1,:), size(types,1), 1), types), 1 );
% Output the types (mixed type columns can be handled using typesOK)
types = types(1, :);

% Output for the above example: 
% >> typesOK = [1 1 1 1 0 1]
% >> types = {'double', 'double', 'double', 'char', 'double', 'char'}

I had thought to use cell2table, since it does type checking for the same reason. However, it doesn't give me the desired result (which columns are which types, strictly).

Is there a quicker way to check type consistency within a cell array's columns?


Edit: I've just done some profiling...

It appears the types = cellfun( @class, ...) line takes over 90% of the processing time. If your method is only subtly different to mine, it should be that line which changes, the strcmp is pretty quick.


Edit: I was fortunate to have many suggestions for this problem, and I have compiled them all into a benchmarking answer for performance tests.

like image 581
Wolfie Avatar asked Jan 18 '18 09:01

Wolfie


4 Answers

To be tested if it can be faster for very large arrays but maybe something like this:

function [b] = IsTypeConsistentColumns(myCell)
%[
    b = true;
    try
        for ci = 1:size(myCell, 2)
           cell2mat(myCell(:, ci));
        end
    catch err
        if (strcmpi(err.identifier, 'MATLAB:cell2mat:MixedDataTypes'))
            b = false;
        else
            rethrow(err);
        end
    end
%]
end

It depends on how fast cell2mat is compared to your string comparison (even is result of cell2mat is not used here.

Note that cell2mat will throw an error if type is not consistent (identifier: 'MATLAB:cell2mat:MixedDataTypes', message = 'All contents of the input cell array must be of the same data type.')

EDIT: limiting to cellfun('isclass', c , cellclass) test

Here only using type consistence check that is internally performed in cell2mat routine:

function [consistences, types] = IsTypeConsistentColumns(myCell)
%[
    ncols = size(myCell, 2);
    consistences = false(1, ncols);
    types = cell(1, ncols);
    for ci = 1:ncols
        cellclass = class(myCell{1, ci});
        ciscellclass = cellfun('isclass', myCell(:, ci), cellclass);

        consistences(ci) = all(ciscellclass);
        types{ci} = cellclass; 
    end    
%]
end

With you test case myCell = repmat( { 1, 2, 3, 'test', 1 , 'abc'; 4, 5, 6, 'foob', 'a', 'def' }, 10000, 5 );,

It takes about 0.0123 seconds on my computer with R2015b ... It could even be faster if you want to fail on first non consistent column (here I'm testing them all)

like image 63
CitizenInsane Avatar answered Sep 20 '22 11:09

CitizenInsane


This is a collection of the different suggestions with a benchmarking script to compare timings...

function benchie    
    % Create a large, mixed type cell array
    myCell = repmat( { 1, 2, 3, 'test',  1 , 'abc';
                       4, 5, 6, 'foob', 'a', 'def' }, 10000, 5 );

    % Create anonymous functions for TIMEIT               
    f1 = @() usingStrcmp(myCell);
    f2 = @() usingUnique(myCell);
    f3 = @() usingLoops(myCell);
    f4 = @() usingISA(myCell);
    f5 = @() usingIsClass(myCell);
    % Timing of different methods
    timeit(f1)
    timeit(f2)
    timeit(f3)    
    timeit(f4)
    timeit(f5)
end

function usingStrcmp(myCell)
    % The original method
    types = cellfun( @class, myCell, 'uni', false );
    typesOK = all( strcmp(repmat(types(1,:), size(types,1), 1), types), 1 );
    types = types(1, :);
end

function usingUnique(myCell)
    % Using UNIQUE instead of STRCMP, as suggested by rahnema1 
    types = cellfun( @class, myCell, 'uni', false );
    [type,~,idx]=unique(types);
    u = unique(reshape(idx,size(types)),'rows');
    if size(u,1) == 1
        % consistent
    else
        % not-consistent
    end
end

function usingLoops(myCell)
    % Using loops instead of CELLFUN. Move onto the next column if a type
    % difference is found, otherwise continue looping down the rows
    types = cellfun( @class, myCell(1,:), 'uni', false );
    typesOK = true(size(types));
    for c = 1:size(myCell,2)
        for r = 1:size(myCell,1)
            if ~strcmp( class(myCell{r,c}), types{c} )
                typesOK(c) = false;
                continue
            end
        end
    end
end

function usingISA(myCell)
    % Using ISA instead of converting all types to strings. Suggested by Sam
    types = cellfun( @class, myCell(1,:), 'uni', false );
    for ii = 1:numel(types)
       typesOK(ii) = all(cellfun(@(x)isa(x,types{ii}), myCell(:,ii)));
    end
end

function usingIsClass(myCell)
    % using the same method as found in CELL2MAT. Suggested by CitizenInsane 
    ncols = size(myCell, 2);
    typesOK = false(1, ncols);
    types = cell(1, ncols);
    for ci = 1:ncols
        cellclass = class(myCell{1, ci});
        ciscellclass = cellfun('isclass', myCell(:, ci), cellclass);
        typesOK(ci) = all(ciscellclass);
        types{ci} = cellclass; 
    end  
end

Outputs:

Tested on R2015b

usingStrcmp:  0.8523 secs
usingUnique:  1.2976 secs
usingLoops:   1.4796 secs
usingISA:    10.2670 secs 
usingIsClass: 0.0131 secs % RAPID!

Tested on R2017b

usingStrcmp:  0.8282 secs
usingUnique:  1.2128 secs
usingLoops:   0.4763 secs % ZOOOOM! (Relative to R2015b)
usingISA:     9.6516 secs
usingIsClass: 0.0093 secs % RAPID!

The looping method will depend heavily on where the type discrepancy occurs, since it could loop over every row of every column or just 2 rows of every column.

With the same inputs though (as shown), the looping has been massively optimised in the newer version of MATLAB (2017b), saving >65% time, and 50% quicker than the original!


Conclusions:

  • For consistently quick times (regardless of input), the original method is still winning.
  • For top speed on newer MATLAB releases, the looping method may be optimal.

  • Update: The method proposed by CitizenInsane is extremely quick compared to other versions, and is likely hard to beat since it uses the same methodology found in Matlab's own cell2mat.

    Recommendation: use the above usingIsClass function.

like image 31
Wolfie Avatar answered Sep 20 '22 11:09

Wolfie


You can use unique:

myCell = { 1, 2, 3, 'test',  1 , 'abc';
            4, 5, 6, 'foob', 'a', 'def' };

types = cellfun( @class, myCell, 'uni', false );
[type,~,idx]=unique(types);
u = unique(reshape(idx,size(types)),'rows');
if size(u,1) == 1
    disp('consistent')
else
     disp('non-consistent')
end
like image 24
rahnema1 Avatar answered Sep 18 '22 11:09

rahnema1


How about this:

>>  myCell = { 1, 2, 3, 'test',  1 , 'abc';
               4, 5, 6, 'foob', 'a', 'def' }
myCell =
  2×6 cell array
    [1]    [2]    [3]    'test'    [1]    'abc'
    [4]    [5]    [6]    'foob'    'a'    'def'

>> firstRowTypes = cellfun(@class, myCell(1,:), 'uni', false)
firstRowTypes =
  1×6 cell array
    'double'    'double'    'double'    'char'    'double'    'char'

>> for i = 1:numel(firstRowTypes)
       typesOK(i) = all(cellfun(@(x)isa(x,firstRowTypes{i}), myCell(:,i)));
   end

>> typesOK
typesOK =
  1×6 logical array
   1   1   1   1   0   1

I haven't done extensive timings, but I think that should speed things up (at least for large cell arrays), as

  1. you only convert the first row's types into strings
  2. you're making the type comparisons directly using isa, rather than converting all the types to strings and then comparing strings.
like image 20
Sam Roberts Avatar answered Sep 22 '22 11:09

Sam Roberts