Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find edges (border of rectangle) inside an image

I have an image of a sticky note on a background (say a wall, or a laptop) and I want to detect the edges of the sticky note (rough detection also works fine) so that i can run a crop on it.

I plan on using ImageMagick for the actual cropping, but am stuck on detecting the edges.

Ideally, my output should give me 4 coordinates for the 4 border points so I can run my crop on it.

How should I proceed with this?

stickynote

like image 744
navinpai Avatar asked Feb 11 '23 01:02

navinpai


1 Answers

You can do that with ImageMagick.

There are different IM methods one can come up with. Here is the first algorithm which came to mind for me. It assumes the "sticky notes" are not tilted or rotated on the larger image:

  1. First stage: use canny edge detection to reveal the edges of the sticky note.
  2. Second stage: determine the coordinates of the edges.

Canny Edge Detection

This command will create a black+white image depicting all edges in the original image:

convert                              \
  http://i.stack.imgur.com/SxrwG.png \
 -canny 0x1+10%+30%                  \
  canny-edges.png

canny-edges.png

Determine Coordinates of Edges

Assuming the image is sized XxY pixels. Then you can resize an image into a 1xY column and a Xx1 row of pixels, where each pixel's color value is the average of the respective pixels of all pixels which were in the same row or same column as the respective column/row pixel.

As an example which can be seen below, I'll first resize the new canny-edges.png to 4xY and Xx4 images:

identify -format  " %W x %H\n"  canny-edges.png
 400x300

convert canny-edges.png -resize 400x4\!   canny-4cols.png
convert canny-edges.png -resize   4x300\! canny-4rows.png

canny-4cols.png

canny-4cols.png

canny-4rows.png

canny-4rows.png

Now that the previous images visualized what the compression-resizing of an image into a few columns or rows of pixels will achieve, let's do it with a single column and a single row. At the same time we'll change the output format to text, not PNG, in order to get the coordinates of these pixels which are white:

convert canny-edges.png -resize 400x1\!   canny-1col.txt
convert canny-edges.png -resize   1x300\! canny-1row.txt

Here is part of the output from canny-1col.txt:

# ImageMagick pixel enumeration: 400,1,255,gray
0,0: (0,0,0)  #000000  gray(0)
1,0: (0,0,0)  #000000  gray(0)
2,0: (0,0,0)  #000000  gray(0)
[....]
73,0: (0,0,0)  #000000  gray(0)
74,0: (0,0,0)  #000000  gray(0)
75,0: (10,10,10)  #0A0A0A  gray(10)
76,0: (159,159,159)  #9F9F9F  gray(159)
77,0: (21,21,21)  #151515  gray(21)
78,0: (156,156,156)  #9C9C9C  gray(156)
79,0: (14,14,14)  #0E0E0E  gray(14)
80,0: (3,3,3)  #030303  gray(3)
81,0: (3,3,3)  #030303  gray(3)
[....]
162,0: (3,3,3)  #030303  gray(3)
163,0: (4,4,4)  #040404  gray(4)
164,0: (10,10,10)  #0A0A0A  gray(10)
165,0: (7,7,7)  #070707  gray(7)
166,0: (8,8,8)  #080808  gray(8)
167,0: (8,8,8)  #080808  gray(8)
168,0: (8,8,8)  #080808  gray(8)
169,0: (9,9,9)  #090909  gray(9)
170,0: (7,7,7)  #070707  gray(7)
171,0: (10,10,10)  #0A0A0A  gray(10)
172,0: (5,5,5)  #050505  gray(5)
173,0: (13,13,13)  #0D0D0D  gray(13)
174,0: (6,6,6)  #060606  gray(6)
175,0: (10,10,10)  #0A0A0A  gray(10)
176,0: (10,10,10)  #0A0A0A  gray(10)
177,0: (7,7,7)  #070707  gray(7)
178,0: (8,8,8)  #080808  gray(8)
[....]
319,0: (3,3,3)  #030303  gray(3)
320,0: (3,3,3)  #030303  gray(3)
321,0: (14,14,14)  #0E0E0E  gray(14)
322,0: (156,156,156)  #9C9C9C  gray(156)
323,0: (21,21,21)  #151515  gray(21)
324,0: (159,159,159)  #9F9F9F  gray(159)
325,0: (10,10,10)  #0A0A0A  gray(10)
326,0: (0,0,0)  #000000  gray(0)
327,0: (0,0,0)  #000000  gray(0)
[....]
397,0: (0,0,0)  #000000  gray(0)
398,0: (0,0,0)  #000000  gray(0)
399,0: (0,0,0)  #000000  gray(0)

As you can see, the detected edges from the text also influenced the grayscale values of the pixels. So we could introduce an additional -threshold 50% operation into our commands, to get pure black+white output:

convert canny-edges.png -resize 400x1\!   -threshold 50% canny-1col.txt
convert canny-edges.png -resize   1x300\! -threshold 50% canny-1row.txt

I'll not quote the contents of the new text files here, you can try it and look for yourself if you are interested. Instead, I'll do a shortcut: I'll output the textual representation of the pixel color values to <stdout> and directly grep it for all non-black pixels:

convert canny-edges.png -resize 400x1\!   -threshold 50% txt:- \
| grep -v black

  # ImageMagick pixel enumeration: 400,1,255,srgb
  76,0: (255,255,255)  #FFFFFF  white
  78,0: (255,255,255)  #FFFFFF  white
  322,0: (255,255,255)  #FFFFFF  white
  324,0: (255,255,255)  #FFFFFF  white

convert canny-edges.png -resize   1x300\! -threshold 50% txt:- \
| grep -v black

  # ImageMagick pixel enumeration: 1,300,255,srgb
  0,39: (255,255,255)  #FFFFFF  white
  0,41: (255,255,255)  #FFFFFF  white
  0,229: (255,255,255)  #FFFFFF  white
  0,231: (255,255,255)  #FFFFFF  white

From above results you can conclude that the four pixel coordinates of the stick note inside the other image are:

  1. lower left corner: (323|40)
  2. upper right corner: (77|230)

The width of the area is 246 pixels and the height is 190 pixels.

(ImageMagick assumes the origin of its coordinate system the upper left corner of an image.)

To now cut the sticky note from the original image you can do:

convert http://i.stack.imgur.com/SxrwG.png[246x190+77+40] sticky-note.png

sticky-note.png

More options to explore

autotrace

You can streamline the above procedure (even transform it into an automatically working script if you want) even more, by converting the intermediate "canny-edges.png" into an SVG vector graphic, for example by running it through autotrace...

This could be useful if your sticky note is tilted or rotated.

Hough Line Detection

Once you have the "canny" lines, you could also apply the Hough Line Detection algorithm on them:

convert              \
  canny-edges.png    \
 -background black   \
 -stroke red         \
 -hough-lines 5x5+20 \
  lines.png

lines.png

Note that the -hough-lines operator extends and draws detected lines from one edge (with floating point values) to another edge of the original image.

While the previous command finally converted the lines to a PNG the -hough-lines operator really generates an MVG file (Magick Vector Graphics) internally. That means you could actually read the source code of the MVG file, and determine the mathematical parameters of each line which is depicted in the "red lines" image:

convert              \
  canny-edges.png    \
 -hough-lines 5x5+20 \
  lines.mvg

This is more sophisticated and also works for edges which are not strictly horizontal and/or vertical.

But your example image does use horizontal and vertical edges, so you can even use simple shell commands to discover these.

There are 80 line descriptions in total in the generated MVG file. You can identify all horizontal lines in that file:

cat lines.mvg                              \
 | while read a b c d e ; do               \
     if [ x${b/0,/} == x${c/400,/} ]; then \
       echo "$a    $b    $c   $d    $e" ;  \
     fi;                                   \
   done

    line     0,39.5    400,39.5    # 249
    line     0,62.5    400,62.5    # 48
    line     0,71.5    400,71.5    # 52
    line     0,231.5   400,231.5   # 249

Now identify all vertical lines:

cat lines.mvg                              \
 | while read a b c d e; do                \
     if [ x${b/,0/} == x${c/,300} ]; then  \
        echo "$a    $b    $c   $d    $e" ; \
     fi;                                   \
   done

   line     76.5,0   76.5,300     # 193
   line    324.5,0  324.5,300     # 193
like image 147
Kurt Pfeifle Avatar answered Feb 15 '23 09:02

Kurt Pfeifle