Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

extract characters in sequence matlab

I want to extract characters in a sequence. For example, given this image:

Here's the code I wrote:

[L Ne]=bwlabel(BinaryImage);
stats=regionprops(L,'BoundingBox');
cc=vertcat(stats(:).BoundingBox);
aa=cc(:,3);
bb=cc(:,4);
hold on
figure
for n=1:size(stats,1)
    if (aa(n)/bb(n) >= 0.2 && aa(n)/bb(n)<= 1.25)
        [r,c] = find(L==n);
        n1=BinaryImage(min(r):max(r),min(c):max(c));
        imshow(~n1);
        pause(0.5)
    end
    hold off
end

What changes should I make for a proper sequence?

like image 705
Nomi Avatar asked Oct 30 '22 20:10

Nomi


1 Answers

regionprops operates by looking for blobs in column-major order. regionprops does not operate in row-major order, which is what you are looking for. The column-major ordering originates from MATLAB itself as operating in column-major order is the native behaviour. In addition, your logic using find / bwlabel also operates in column-major format so you will have to keep both of these things in mind when trying to display your characters in row-major format.


As such, a simple way is to modify your for loop so that way you access the structure row-wise instead of column-wise. For your example image, the ordering of characters is delineated is like so:

 1   3   5
 2   4   6

You would need to access the structure in the following order: [1 3 5 2 4 6]. Therefore, you would change your for loop to access this new array and you can create this new array like so:

ind = [1:2:numel(stats) 2:2:numel(stats)];

Once you do that, just modify your for loop to access the values in ind instead. To fully make your code reproducible, I'm going to read your image directly from StackOverflow and invert the image as the text is black. The text needs to be white for the blob analysis to be successful:

%// Added
clear all; close all;
BinaryImage = ~im2bw(imread('http://s4.postimg.org/lmz6uukct/plate.jpg'));

[L Ne]=bwlabel(BinaryImage);
stats=regionprops(L,'BoundingBox');
cc=vertcat(stats(:).BoundingBox);
aa=cc(:,3);
bb=cc(:,4);
figure;

ind = [1:2:numel(stats) 2:2:numel(stats)]; %// Change
for n = ind %// Change
    if (aa(n)/bb(n) >= 0.2 && aa(n)/bb(n)<= 1.25)
        [r,c] = find(L==n);
        n1=BinaryImage(min(r):max(r),min(c):max(c));
        imshow(~n1);
        pause(0.5)
    end
end

Warning

The above code assumes that there are only two rows of characters. If you have more, then it is obvious that the indices specified will not work.

If you want it to work for multiple lines, then this logic I'm going to write assumes that the text is horizontal and not on an angle. Simply put, you'd loop until you run out of structures and at the beginning of the loop, you would search for blob that has the smallest (x,y) coordinate of the top-left corner of the blob that we didn't process. Once you find this, you search for all y coordinates that are within some threshold of this source y coordinate and you'd grab the indices at these locations. You'd repeat this until you run out of structures.

Something like this:

thresh = 5; %// Declare tolerance

cc=vertcat(stats(:).BoundingBox);
topleft = cc(:,1:2);

ind = []; %// Initialize list of indices
processed = false(numel(stats),1); %// Figure out those blobs that have been processed
while any(~processed) %// While there is at least one blob to look at...
    %// Determine the blob that has the smallest y/row coordinate that's  
    %// unprocessed
    cc_proc = topleft(~processed,:);
    ys = min(cc_proc(:,2));

    %// Find all blobs along the same row that are +/-thresh rows from
    %// the source row
    loc = find(abs(topleft(:,2)-ys) <= thresh & ~processed);

    %// Add to list and mark them off
    ind = [ind; loc];
    processed(loc) = true;
end

ind = ind.'; %// Ensure it's a row

You'd then use the ind variable and use it with the for loop just like before.

like image 111
rayryeng Avatar answered Nov 08 '22 04:11

rayryeng