Part of my data (cell array of strings) is shown below. I want to count the occurrences of particular strings (e.g., 'P0702', 'P0882', etc.) and display the sum of the occurrences in the form of the output shown below:
'1FA' '2012' 'F' '' '' '' '' '' 'P0702' 'P0882'
'1Fc' '2012' 'r' '' '' '' '' '' 'P0702' '' '' ''
'1FA' '2012' 'f' '' '' '' '' '' 'P0702' 'P0882' ''
'1FA' '2012' 'y' '' '' '' 'P0702' '' '' '' '' ''
'1FA' '2012' 'g' '' '' '' '' '' '' '' '' '' ''
'1FA' '2012' 'u' '' 'P0702' 'P0882' '' '' '' '' ''
'1FA' '2012' 'y' '' 'P0702' '' '' '' '' '' '' ''
'1FA' '2012' 'n' '' 'P0702' '' '' '' '' '' '' ''
'1FA' '2012' 'j' '' '' '' '' '' '' '' '' 'P0702'
'1FA' '2012' 'u' 'P0702' '' '' '' '' '' '' '' ''
'1FM' '2013' 'x' '' '' '' '' '' 'P1921' '' '' ''
'1FM' '2013' 'c' '' 'P1711' '' '' '' '' '' '' ''
'1FM' '2013' 'c' '' '' '' '' '' 'P0702' 'P0882' ''
'1FM' '2009' 'E' '' '' '' '' '' '' '' 'P0500'
Output:
sum of counts above
P0702 15
P0500 1
P1711 1
and so on.
I tried using sum(strcmp(d,{'P0882'}),2); which tells me how many times 'P0882' occurs, but it would be difficult to use it for every data string.
You could do as follows, basically apply strcmp as you proposed but in a loop in which you pre-determined the unique strings/data names to count.
I modified a bit the data you provided so that dimensions fit. The code is commented and pretty easy to follow:
C = {'1FA' '2012' 'F' '' '' '' '' '' 'P0702' 'P0882' ;
'1Fc' '2012' 'r' '' '' '' '' '' 'P0702' '';
'1FA' '2012' 'f' '' '' '' '' '' 'P0702' 'P0882';
'1FA' '2012' 'y' '' '' '' 'P0702' '' '' '';
'1FA' '2012' 'g' '' '' '' '' '' '' '';
'1FA' '2012' 'u' '' 'P0702' 'P0882' '' '' '' '' ;
'1FA' '2012' 'y' '' 'P0702' '' '' '' '' '' ;
'1FA' '2012' 'n' '' 'P0702' '' '' '' '' '' ;
'1FA' '2012' 'j' '' '' '' '' '' '' 'P0702' ;
'1FA' '2012' 'u' 'P0702' '' '' '' '' '' '' ;
'1FM' '2013' 'x' '' '' '' '' '' 'P1921' '';
'1FM' '2013' 'c' '' 'P1711' '' '' '' '' '';
'1FM' '2013' 'c' '' '' '' '' '' 'P0702' 'P0882';
'1FM' '2009' 'E' '' '' '' '' '' '' 'P0500'}
%// Find unique strings to count occurence of.
[strings,~,~] = unique(C(:,4:end));
%// Remove empty cells automatically.
strings = strings(~cellfun(@isempty,strings));
%// Initialize output cell array
Output = cell(numel(strings),2);
%// Count occurence. You can combine the 2 lines into one using concatenation.
for k = 1:numel(strings)
Output{k,1} = strings{k};
Output{k,2} = sum(sum(strcmp(C(:,4:end),strings{k})));
end
Let's make a nice table out of this:
T = table(Output(:,2),'RowNames',Output(:,1),'VariableNames',{'TotalOccurences'})
Output:
T =
TotalOccurences
_______________
P0500 [ 1]
P0702 [10]
P0882 [ 4]
P1711 [ 1]
P1921 [ 1]
If you don't have access to the table function, you can create a cell array with headers and change a bit the loop:
%// Initialize output cell array
Output = cell(numel(strings)+1,2);
%// Count occurence
for k = 1:numel(strings)
Output{k+1,1} = strings{k};
Output{k+1,2} = sum(sum(strcmp(C(:,4:end),strings{k})));
end
%T = table(Output(:,2),'RowNames',Output(:,1),'VariableNames',{'TotalOccurences'})
Output(1,:) = {'Data' 'Occurence'}
Output:
Output =
'Data' 'Occurence'
'P0500' [ 1]
'P0702' [ 10]
'P0882' [ 4]
'P1711' [ 1]
'P1921' [ 1]
If you have the Statistics Toolbox you can simply use tabulate
%// get only relevant part
X = data(:,4:end);
%// tabulate
tabulate(X(:))
It already gives a nicely formatted output:
Value Count Percent
P0702 10 58.82%
P1711 1 5.88%
P0882 4 23.53%
P1921 1 5.88%
P0500 1 5.88%
Alternatively with standard functions:
X = data(:,4:end)
[a,~,x] = unique(X(~strcmp(X,'')))
occ = hist(x(:),1:numel(a))
out = [a num2cell(occ).']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With