I've implemented an image/video transformation technique called discrete cosine transform. This technique is used in MPEG video encoding. I based my algorithm on the ideas presented at the following URL:
http://vsr.informatik.tu-chemnitz.de/~jan/MPEG/HTML/mpeg_tech.html
Now I can transform an 8x8 section of a black and white image, such as:
0140 0124 0124 0132 0130 0139 0102 0088 0140 0123 0126 0132 0134 0134 0088 0117 0143 0126 0126 0133 0134 0138 0081 0082 0148 0126 0128 0136 0137 0134 0079 0130 0147 0128 0126 0137 0138 0145 0132 0144 0147 0131 0123 0138 0137 0140 0145 0137 0142 0135 0122 0137 0140 0138 0143 0112 0140 0138 0125 0137 0140 0140 0148 0143
Into this an image with all the important information at the top right. The transformed block looks like this:
1041 0039 -023 0044 0027 0000 0021 -019 -050 0044 -029 0000 0009 -014 0032 -010 0000 0000 0000 0000 -018 0010 -017 0000 0014 -019 0010 0000 0000 0016 -012 0000 0010 -010 0000 0000 0000 0000 0000 0000 -016 0021 -014 0010 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 -010 0013 -014 0010 0000 0000
Now, I need to know how can I take advantage of this transformation? I'd like to detect other 8x8 blocks in the same image ( or another image ) that represent a good match.
Also, What does this transformation give me? Why is the information stored in the top right of the converted image important?
The result of a DCT is a transformation of the original source into the frequency domain. The top left entry stores the "amplitude" the "base" frequency and frequency increases both along the horizontal and vertical axes. The outcome of the DCT is usually a collection of amplitudes at the more usual lower frequencies (the top left quadrant) and less entries at the higher frequencies. As lassevk mentioned, it is usual to just zero out these higher frequencies as they typically constitute very minor parts of the source. However, this does result in loss of information. To complete the compression it is usual to use a lossless compression over the DCT'd source. This is where the compression comes in as all those runs of zeros get packed down to almost nothing.
One possible advantage of using the DCT to find similar regions is that you can do a first pass match on low frequency values (top-left corner). This reduces the number of values you need to match against. If you find matches of low frequency values, you can increase into comparing the higher frequencies.
Hope this helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With