Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement 4-point perspective transform using HTML5 canvas & three.js?

First, a visual example of what I am trying to achieve:

example

(Photo credit: https://unsplash.com/photos/pGcqw1ARGyg)

The short (tl;dr) question

Using HTML5 video & canvas, how can I perform a 4-point perspective transform so that I can render just the "TV screen" part of the frame in the canvas? Why doesn't my implementation show the correct area?

Background about what I'm trying to achieve

I am trying to build a web page which works as follows:

  1. The user points their webcam towards a TV, so that it is somewhere in the frame (but potentially at any angle)
  2. Using HTML5 video & canvas, the webcam is captured and previewed on the web page
  3. The user is able to define (by clicking on the preview) where the 4 corners of the TV screen are (4 pairs of x/y coordinates)
  4. ** The video is warped (using some kind of perspective transform) so that the canvas only shows the part of the image for their actual TV screen (not the whole webcam view) **
  5. Some processing is then performed on the image (for example, identifying most prominent colours). This part outside the scope of this question, other than pointing out that I will want to be able to access the content/pixels of an HTML5 canvas at the end.

The part I am struggling with is step 4. In order to make sure that I am only processing the relevant part of the image for each frame of the video, it is important that I "warp" the image so that it only shows the "TV screen" area and not the whole webcam picture.

Having done a bit of reading up, my understanding is that:

  • This requires some kind perspective transform and, because the webcam can be at any angle and we are not dealing with parallel lines, a 3-dimensional transform is required and 2D will not suffice. This is because a 2D transform (translate/rotate/scale/skew) would not be able to deal with the converging sides.
  • HTML5 canvas is a two dimensional context and can therefore only support 2D transforms, not a 3D transform. Because I need a solution that works with canvas, I can't simply use a 3D CSS transform (e.g. https://developer.mozilla.org/en-US/docs/Web/CSS/transform-function/matrix3d). This suggests that maybe WebGL is more what I need to deal with the 3D aspect.

What I've attempted so far

With that in mind, I attempted the following approach:

a) Capture the webcam using a video tag

b) Using three.js, create a 3D scene which is rendered into a canvas element (so that I can perform my image processing on the resultant canvas contents)

c) The three.js scene consists of: - a flat mesh containing which shows the video on one side using a VideoTexture. - a perspective camera, initially positioned so that it shows the whole webcam image

d) Allow the user to click the four corner points to define where their TV is, work out what the x/y coordinates are and save them

e) Calculate a perspective transform which would "stretch" the image out so that the correct area "fills the frame". In other words, stretch the four clicked "TV corner" points to the four corners of the viewport. I have been using this library: https://github.com/jlouthan/perspective-transform to calculate this.

f) My thinking is that, if the appropriate transform is applied to the mesh containing the video, and the camera stays in a fixed position, then the output canvas would contain the required image when looking at it in 2D.

Link to my current (broken) implementation

Here is a link to my current attempt at the above. It shows the video and allows you to click the four corners. It seems like it works if you click points around the origin (in the centre) but the problem is that it shows the wrong area if you choose areas elsewhere in the image.

https://bitbucket.org/mattwilson1024/perspective-transform/src/master/

Summing up

I'd be really grateful for any help working out why this isn't working as I expected, or for any pointers on whether there is a better/easier approach to achieve what I need.

like image 837
Matt Wilson Avatar asked Mar 05 '23 19:03

Matt Wilson


1 Answers

The problem with the original implementation is in the way that transformMatrix was being created.

I was able to make it work by changing this:

transformMatrix.set(a1, a2, a3, 0, 
                    b1, b2, b3, 0, 
                    c1, c2, c3, 0, 
                    0,  0,  0,  1);

to this:

transformMatrix.set(a1, a2, 0, a3, 
                    b1, b2, 0, b3, 
                    0,  0,  0, 1, 
                    c1, c2, 0, c3);

This answer on the Math StackExchange was helpful for working this out.

For the benefit of anyone finding this question in the future, I've updated the original question so that it points to an archive branch containing the broken code. The working version can be found here.

like image 149
Matt Wilson Avatar answered Apr 07 '23 18:04

Matt Wilson