Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count unique rows in a cell full of vectors

I have a cell in MATLAB where each element contains a vector of a different length

e.g.

C = {[1 2 3], [2 4 5 6], [1 2 3], [6 4], [7 6 4 3], [4 6], [6 4]}

As you can see, some of the the vectors are repeated, others are unique.

I want to count the number of times each vector occurs and return the count such that I can populate a table in a GUI where each row is a unique combination and the date shows how many times each combination occurs.

e.g.

            Count
"[1 2 3]"     2
"[6 4]"       2
"[2 4 5 6]"   1
"[7 6 4 3]"   1
"[4 6]"       1

I should say that the order of the numbers in each vector is important i.e. [6 4] is not the same as [4 6].

Any thoughts how I can do this fairly efficiently?

Thanks to people who have commented so far. As @Divakar kindly pointed out, I forgot to mention that the values in the vector can be more than one digit long. i.e. [46, 36 28]. My original code would concatenate the vector [1 2 3 4] into 1234 then use hist to do the counting. Of course this falls apart when you got above single digits as you can tell the difference between [1, 2, 3, 4] and [12, 34].

like image 840
Mark Avatar asked Sep 26 '14 18:09

Mark


3 Answers

You can convert all the entries to char and then to a 2D numeric array and finally use unique(...'rows') to get labels for unique rows and use them to get their counts.

C = {[46, 36 28], [2 4 5 6], [46, 36 28], [6 4], [7 6 4 3], [4 6], [6 4]} %// Input

char_array1 = char(C{:})-0; %// convert input cell array to a char array
[~,unqlabels,entry_labels] = unique(char_array1,'rows'); %// get unique rows
count = histc(entry_labels,1:max(entry_labels)); %// counts of each unique row

For the purpose of presenting the output in a format as asked in the question, you can use this -

out = [C(unqlabels)' num2cell(count)];

Output -

out = 
    [1x4 double]    [1]
    [1x2 double]    [1]
    [1x2 double]    [2]
    [1x4 double]    [1]
    [1x3 double]    [2]

and display the unique rows with celldisp -

ans{1} =
     2     4     5     6
ans{2} =
     4     6
ans{3} =
     6     4
ans{4} =
     7     6     4     3
ans{5} =
    46    36    28

Edit: If you have negative numbers in there, you need to do little more work to setup char_array1 as shown here and rest of the code stays the same -

lens = cellfun(@numel,C);
mat1(max(lens),numel(lens))=0;
mat1(bsxfun(@ge,lens,[1:max(lens)]')) = horzcat(C{:});
char_array1 = mat1';
like image 186
Divakar Avatar answered Oct 17 '22 11:10

Divakar


A way I can think of is to convert to strings and then use unique

Cs = cellfun(@(x)(mat2str(x)),C,'uniformoutput',false);
[Cu,idx_u,idx] = unique(Cs);

now you can count the number of occurrences with idx, for instance using

fv=tabulate(idx)

so fv, has already all the info you need, but for purposes of display I'll add:

[Cu' , num2cell(fv(:,2))]

ans = 

'[1 2 3]'      [2]
'[2 4 5 6]'    [1]
'[4 6]'        [1]
'[6 4]'        [2]
'[7 6 4 3]'    [1]
like image 31
bla Avatar answered Oct 17 '22 09:10

bla


Another suggestion I can think of is to convert each array into a concatenation of numbers, then do a histogram to count how many values you have per entry. We would need to figure out how many unique numbers we have first, which would serve as the histogram edges through unique.

One thing I will need to note is that we are assuming that each element in your array for each cell is a single digit. This obviously won't work if there are numbers that are two digits or more.

In other words:

%// Convert each array of numbers into a single number
numbers = cellfun(@(x) sum(x.*10.^(numel(x)-1:-1:0)), C);
%// Find unique numbers
uniNumbers = unique(numbers);

%// Get histogram
out = histc(numbers, uniNumbers);

%// Display counts
disp([uniNumbers; out]);

out would contain the counts per unique number in your cell array. We get:

      46          64         123        2456        7643
       1           2           2           1           1

The trick with the first line of code is that I'm using the decomposition of numbers in base 10 where each digit can be uniquely represented as a sum of multiples of powers of 10. As such, 4587 can be represented as:

4000 + 500 + 80 + 7 ==> 4*10^3 + 5*10^2 + 8*10^1 + 7*10^0

I took each number in our array, and used those as coefficients for each decreasing power of 10, then summed them all together. As such, in your cell arrays, [1 2 3], is converted to 123, and so on. With your example, this is the output of numbers, which is doing what I talked about above:

numbers =

  Columns 1 through 6

         123        2456         123          64        7643          46

  Column 7

          64

Compare this with your actual cell array in C:

celldisp(C)

C{1} = 
     1     2     3     
C{2} =
     2     4     5     6
C{3} =
     1     2     3 
C{4} =
     6     4
C{5} =
     7     6     4     3
C{6} =
     4     6
C{7} =
     6     4
like image 3
rayryeng Avatar answered Oct 17 '22 10:10

rayryeng