I want to extract characters in a sequence. For example, given this image:
Here's the code I wrote:
[L Ne]=bwlabel(BinaryImage);
stats=regionprops(L,'BoundingBox');
cc=vertcat(stats(:).BoundingBox);
aa=cc(:,3);
bb=cc(:,4);
hold on
figure
for n=1:size(stats,1)
if (aa(n)/bb(n) >= 0.2 && aa(n)/bb(n)<= 1.25)
[r,c] = find(L==n);
n1=BinaryImage(min(r):max(r),min(c):max(c));
imshow(~n1);
pause(0.5)
end
hold off
end
What changes should I make for a proper sequence?
regionprops
operates by looking for blobs in column-major order. regionprops
does not operate in row-major order, which is what you are looking for. The column-major ordering originates from MATLAB itself as operating in column-major order is the native behaviour. In addition, your logic using find / bwlabel
also operates in column-major format so you will have to keep both of these things in mind when trying to display your characters in row-major format.
As such, a simple way is to modify your for
loop so that way you access the structure row-wise instead of column-wise. For your example image, the ordering of characters is delineated is like so:
1 3 5
2 4 6
You would need to access the structure in the following order: [1 3 5 2 4 6]
. Therefore, you would change your for
loop to access this new array and you can create this new array like so:
ind = [1:2:numel(stats) 2:2:numel(stats)];
Once you do that, just modify your for
loop to access the values in ind
instead. To fully make your code reproducible, I'm going to read your image directly from StackOverflow and invert the image as the text is black. The text needs to be white for the blob analysis to be successful:
%// Added
clear all; close all;
BinaryImage = ~im2bw(imread('http://s4.postimg.org/lmz6uukct/plate.jpg'));
[L Ne]=bwlabel(BinaryImage);
stats=regionprops(L,'BoundingBox');
cc=vertcat(stats(:).BoundingBox);
aa=cc(:,3);
bb=cc(:,4);
figure;
ind = [1:2:numel(stats) 2:2:numel(stats)]; %// Change
for n = ind %// Change
if (aa(n)/bb(n) >= 0.2 && aa(n)/bb(n)<= 1.25)
[r,c] = find(L==n);
n1=BinaryImage(min(r):max(r),min(c):max(c));
imshow(~n1);
pause(0.5)
end
end
The above code assumes that there are only two rows of characters. If you have more, then it is obvious that the indices specified will not work.
If you want it to work for multiple lines, then this logic I'm going to write assumes that the text is horizontal and not on an angle. Simply put, you'd loop until you run out of structures and at the beginning of the loop, you would search for blob that has the smallest (x,y)
coordinate of the top-left corner of the blob that we didn't process. Once you find this, you search for all y
coordinates that are within some threshold of this source y
coordinate and you'd grab the indices at these locations. You'd repeat this until you run out of structures.
Something like this:
thresh = 5; %// Declare tolerance
cc=vertcat(stats(:).BoundingBox);
topleft = cc(:,1:2);
ind = []; %// Initialize list of indices
processed = false(numel(stats),1); %// Figure out those blobs that have been processed
while any(~processed) %// While there is at least one blob to look at...
%// Determine the blob that has the smallest y/row coordinate that's
%// unprocessed
cc_proc = topleft(~processed,:);
ys = min(cc_proc(:,2));
%// Find all blobs along the same row that are +/-thresh rows from
%// the source row
loc = find(abs(topleft(:,2)-ys) <= thresh & ~processed);
%// Add to list and mark them off
ind = [ind; loc];
processed(loc) = true;
end
ind = ind.'; %// Ensure it's a row
You'd then use the ind
variable and use it with the for
loop just like before.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With