I needed to implement 'choosing an object' in a 3D environment. So instead of going with robust, accurate approach, such as raycasting, I decided to take the easy way out. First, I transform the objects world position onto screen coordinates:
glm::mat4 modelView, projection, accum;
glGetFloatv(GL_PROJECTION_MATRIX, (GLfloat*)&projection);
glGetFloatv(GL_MODELVIEW_MATRIX, (GLfloat*)&modelView);
accum = projection * modelView;
glm::mat4 transformed = accum * glm::vec4(objectLocation, 1);
Followed by some trivial code to transform the opengl coordinate system to normal window coordinates, and do a simple distance from the mouse check. BUT that doesn't quite work. In order to translate from world space to screen space, I need one more calculation added on to the end of the function shown above:
transformed.x /= transformed.z;
transformed.y /= transformed.z;
I don't understand why I have to do this. I was under the impression that, once one multiplied your vertex by the accumulated modelViewProjection matrix, you had your screen coordinates. But I have to divide by Z to get it to work properly. In my openGL 3.3 shaders, I never have to divide by Z. Why is this?
EDIT: The code to transform from from opengl coordinate system to screen coordinates is this:
int screenX = (int)((trans.x + 1.f)*640.f); //640 = 1280/2
int screenY = (int)((-trans.y + 1.f)*360.f); //360 = 720/2
And then I test if the mouse is near that point by doing:
float length = glm::distance(glm::vec2(screenX, screenY), glm::vec2(mouseX, mouseY));
if(length < 50) {//you can guess the rest
EDIT #2
This method is called upon a mouse click event:
glm::mat4 modelView;
glm::mat4 projection;
glm::mat4 accum;
glGetFloatv(GL_PROJECTION_MATRIX, (GLfloat*)&projection);
glGetFloatv(GL_MODELVIEW_MATRIX, (GLfloat*)&modelView);
accum = projection * modelView;
float nearestDistance = 1000.f;
gameObject* nearest = NULL;
for(uint i = 0; i < objects.size(); i++) {
gameObject* o = objects[i];
o->selected = false;
glm::vec4 trans = accum * glm::vec4(o->location,1);
trans.x /= trans.z;
trans.y /= trans.z;
int clipX = (int)((trans.x+1.f)*640.f);
int clipY = (int)((-trans.y+1.f)*360.f);
float length = glm::distance(glm::vec2(clipX,clipY), glm::vec2(mouseX, mouseY));
if(length<50) {
nearestDistance = trans.z;
nearest = o;
}
}
if(nearest) {
nearest->selected = true;
}
mouseRightPressed = true;
The code as a whole is incomplete, but the parts relevant to my question works fine. The 'objects' vector contains only one element for my tests, so the loop doesn't get in the way at all.
I've figured it out. As Mr David Lively pointed out,
Typically in this case you'd divide by
.w
instead of.z
to get something useful, though.
My .w
values were very close to my .z
values, so in my code I change the statement:
transformed.x /= transformed.z;
transformed.y /= transformed.z;
to:
transformed.x /= transformed.w;
transformed.y /= transformed.w;
And it still worked just as before.
https://stackoverflow.com/a/10354368/2159051 explains that division by w will be done later in the pipeline. Obviously, because my code simply multiplies the matrices together, there is no 'later pipeline'. I was just getting lucky in a sense, because my .z
value was so close to my .w
value, there was the illusion that it was working.
The divide-by-Z step effectively applies the perspective transformation. Without it, you'd have an iso view. Imagine two view-space vertices: A(-1,0,1)
and B(-1,0,100)
.
Without the divide by Z step, the screen coordinates are equal (-1,0)
.
With the divide-by-Z, they are different: A(-1,0)
and B(-0.01,0)
. So, things farther away from the view-space origin (camera) are smaller in screen space than things that are closer. IE, perspective.
That said: if your projection matrix (and matrix multiplication code) is correct, this should already be happening, as the projection matrix will contain 1/Z
scaling components which do this. So, some questions:
Are you really using the output of a projection transform, or just the view transform?
Are you doing this in a pixel/fragment shader? Screen coordinates there are normalized (-1,-1) to (+1,+1), not pixel coordinates, with the origin at the middle of the viewport. Typically in this case you'd divide by .w
instead of .z
to get something useful, though.
If you're doing this on the CPU, how are you getting this information back to the host?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With