I want to convert from Screen coordinates to world coordinates in <code>OpenGL</code>. I am using <code>glm</code> for that purpose (also I am using <code>glfw</code>) This is my code: <pre class="prettyprint"><code>static void mouse_callback(GLFWwindow* window, int button, int action, int mods) { if (button == GLFW_MOUSE_BUTTON_LEFT) { if(GLFW_PRESS == action){ int height = 768, width =1024; double xpos,ypos,zpos; glfwGetCursorPos(window, &xpos, &ypos); glReadPixels(xpos, ypos, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &zpos); glm::mat4 m_projection = glm::perspective(glm::radians(45.0f), (float)(1024/768), 0.1f, 1000.0f); glm::vec3 win(xpos,height - ypos, zpos); glm::vec4 viewport(0.0f,0.0f,(float)width, (float)height); glm::vec3 world = glm::unProject(win, mesh.getView() * mesh.getTransform(),m_projection,viewport); std::cout << "screen " << xpos << " " << ypos << " " << zpos << std::endl; std::cout << "world " << world.x << " " << world.y << " " << world.z << std::endl; } } } </code></pre> Now, I have 2 problem, the first is that the world vector that I get from <code>glm::unProject</code> has a very small x, y and z. If i use this values to translate the mesh, the mesh suffers a small translate and doesn't follow the mouse pointer. The second problem is, that as said in the glm docs (https://glm.g-truc.net/0.9.8/api/a00169.html#ga82a558de3ce42cbeed0f6ec292a4e1b3) the result is returned in object coordinates. So in order to convert screen to world coordinates I should use a transform matrix from one mesh, but what happens if a have many meshes and i want to convert from screen to world coordinates? what model matrix should I multuply by camera view matrix to form ModelView matrix?

There are a couple of issues with this sequence: <blockquote> <pre class="prettyprint"><code> glfwGetCursorPos(window, &xpos, &ypos); glReadPixels(xpos, ypos, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &zpos); [...] glm::vec3 win(xpos,height - ypos, zpos); </code></pre> </blockquote> <ol> <li> Window space origin. <code>glReadPixels</code> is a GL function, and as such adheres to GL's conventions, with the origin beeing the lower left pixel. While you flip to that convention for your <code>win</code> variable, you do still use the wrong origin for reading the depth buffer. Furthermore, your flipping is wrong. Since <code>ypos</code> should be in <code>[0,height-1]</code>, the correct formula is <code>height-1 - ypos</code>, so you are also off by one here. (We will see later that that isn't exactly true either.) </li> <li> "Screen Coordinates" vs. Pixel Coordinates. Your code assumes that the coordinates you get back from GLFW are in pixels. This is not the case. GLFW uses the concept of "virtual screen coordinates" which don't necessarily map to pixels: <blockquote> Pixels and screen coordinates may map 1:1 on your machine, but they won't on every other machine, for example on a Mac with a Retina display. The ratio between screen coordinates and pixels may also change at run-time depending on which monitor the window is currently considered to be on. </blockquote> GLFW generally provides two sizes for a window, <code>glfwGetWindowSize</code> will return the result in said virtual screen coordinates, while <code>glfwGetFramebufferSize</code> will return the actual size in pixels, relevant for OpenGL. So basically, you must query both sizes, and than can appropriately scale the mouse coords from screen coords to the actual pixels you need. </li> <li>Sub-Pixel position. While <code>glReadPixels</code> addresses a specific pixel with integer coordinates, the whole transformation math works with floating point and can represent arbitrary sub-pixel positions. GL's window space is defined so that integer coordinates represent the corners of the pixels, the pixel centers lie at half integer coordinates. Your <code>win</code> variable will represent the lower left corner of said pixel, but the more useful convention would be to use the pixel center, so you'd better add an offset of <code>(0.5f, 0.5f, 0.0f)</code> to <code>win</code>, assuming you point to the pixel center. (We can do a bit better if the virtual screen coords are higher resolution than our pixels, which means we already get a sub-pixel position for the mouse cursor, but the math won't change, because we have still to switch to the GL's convent where integer means border instead of integer means center). Note that since we now consider a space which is going from <code>[0,w)</code> in <code>x</code> and <code>[0,h)</code> in <code>y</code>, this also affects point 1. If you click at pixel <code>(0,0)</code>, it will have the center <code>(0.5, 0.5)</code>, and the <code>y</code> flipping should be <code>h-y</code> so <code>h-0.5</code> (which should be rounded down towards <code>h-1</code> when accessing the framebuffer pixel).</li> </ol> To put it all together, you could do (conceptually): <pre class="prettyprint"><code>glfwGetWindowSize(win, &screen_w, &screen_h); // better use the callback and cache the values glfwGetFramebufferSize(win, &pixel_w, &pixel_h); // better use the callback and cache the values glfwGetCursorPos(window, &xpos, &ypos); glm::vec2 screen_pos=glm::vec2(xpos, ypos); glm::vec2 pixel_pos=screen_pos * glm::vec2(pixel_w, pixel_h) / glm::vec2(screen_w, screen_h); // note: not necessarily integer pixel_pos = pixel_pos + glm::vec2(0.5f, 0.5f); // shift to GL's center convention glm::vec3 win=glm::vec3(pixel_pos., pixel_h-pixel_pos.y, 0.0f); glReadPixels( (GLint)win.x, (GLint)win.y, ..., &win.z) // ... unproject win </code></pre> <blockquote> what model matrix should I multuply by camera view matrix to form ModelView matrix? </blockquote> None. The basic coordinate transformation pipeline is <pre class="prettyprint"><code>object space -> {MODEL} -> World Space -> {VIEW} -> Eye Space -> {PROJ} -> Clip Space -> {perspective divide} -> NDC -> {Viewport/DepthRange} -> Window Space </code></pre> There is no model matrix influencing the way from world to window space, hence inverting it will also not depend on any model matrix either. <blockquote> that as said in the glm docs (https://glm.g-truc.net/0.9.8/api/a00169.html#ga82a558de3ce42cbeed0f6ec292a4e1b3) the result is returned in object coordinates. </blockquote> The math doesn't care about which spaces you transform between. The documentation mentions object space, and the function uses an argument named <code>modelView</code>, but what matrix you put there is totally irrelevant. Putting just <code>view</code> there will be fine. <blockquote> So in order to convert screen to world coordinates I should use a transform matrix from one mesh. </blockquote> Well, you could even do that. You could use any model matrix of any object, as long as the matrix isn't singular, and as long as you use the same matrix for the unproject as you later use for going from object space to world space. You can even make up a random matrix, if you make sure it is regular. (Well, there might be numerical issues if the matrix is ill-conditioned). The key thing here is that when you specify (V*M) and P as the matrices for <code>glm::unproject</code>, it will internally calculate <code>(V*M)^-1 * P^-1 * ndc_pos</code> which is <code>M^-1 * V^-1 & P^-1 * ndc_pos</code>. If you transform the result back from object space to world space, you multiply that by <code>M</code> again, resulting in <code>M * M^-1 * V^-1 & P^-1 * ndc_pos</code>, which is of course just <code>V^-1 & P^-1 * ndc_pos</code> which you would directly have gotten if you didn't put <code>M</code> into the unproject in the first place. You just added more computational work, and introduced more potential for numerical issues...

Screen Coordinates to World Coordinates

Tags:

c++

matrix

projection

opengl

glm-math

I want to convert from Screen coordinates to world coordinates in OpenGL. I am using glm for that purpose (also I am using glfw)

This is my code:

static void mouse_callback(GLFWwindow* window, int button, int action, int mods)
{
    if (button == GLFW_MOUSE_BUTTON_LEFT) {
        if(GLFW_PRESS == action){
            int height = 768, width =1024; 
            double xpos,ypos,zpos;
            glfwGetCursorPos(window, &xpos, &ypos);

            glReadPixels(xpos, ypos, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &zpos);

            glm::mat4 m_projection = glm::perspective(glm::radians(45.0f), (float)(1024/768), 0.1f, 1000.0f);

            glm::vec3 win(xpos,height - ypos, zpos);
            glm::vec4 viewport(0.0f,0.0f,(float)width, (float)height);
            glm::vec3 world = glm::unProject(win, mesh.getView() * mesh.getTransform(),m_projection,viewport);

            std::cout << "screen " << xpos << " " << ypos << " " << zpos << std::endl;
            std::cout << "world " << world.x << " " << world.y << " " << world.z << std::endl;
        }
    }
}

Now, I have 2 problem, the first is that the world vector that I get from glm::unProject has a very small x, y and z. If i use this values to translate the mesh, the mesh suffers a small translate and doesn't follow the mouse pointer.

The second problem is, that as said in the glm docs (https://glm.g-truc.net/0.9.8/api/a00169.html#ga82a558de3ce42cbeed0f6ec292a4e1b3) the result is returned in object coordinates. So in order to convert screen to world coordinates I should use a transform matrix from one mesh, but what happens if a have many meshes and i want to convert from screen to world coordinates? what model matrix should I multuply by camera view matrix to form ModelView matrix?

993

asked Aug 21 '17 11:08

RdlP

1 Answers

There are a couple of issues with this sequence:

       glfwGetCursorPos(window, &xpos, &ypos);
       glReadPixels(xpos, ypos, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &zpos);
       [...]
       glm::vec3 win(xpos,height - ypos, zpos);

Window space origin. glReadPixels is a GL function, and as such adheres to GL's conventions, with the origin beeing the lower left pixel. While you flip to that convention for your win variable, you do still use the wrong origin for reading the depth buffer.

Furthermore, your flipping is wrong. Since ypos should be in [0,height-1], the correct formula is height-1 - ypos, so you are also off by one here. (We will see later that that isn't exactly true either.)
"Screen Coordinates" vs. Pixel Coordinates. Your code assumes that the coordinates you get back from GLFW are in pixels. This is not the case. GLFW uses the concept of "virtual screen coordinates" which don't necessarily map to pixels:

Pixels and screen coordinates may map 1:1 on your machine, but they won't on every other machine, for example on a Mac with a Retina display. The ratio between screen coordinates and pixels may also change at run-time depending on which monitor the window is currently considered to be on.

GLFW generally provides two sizes for a window, glfwGetWindowSize will return the result in said virtual screen coordinates, while glfwGetFramebufferSize will return the actual size in pixels, relevant for OpenGL. So basically, you must query both sizes, and than can appropriately scale the mouse coords from screen coords to the actual pixels you need.
Sub-Pixel position. While glReadPixels addresses a specific pixel with integer coordinates, the whole transformation math works with floating point and can represent arbitrary sub-pixel positions. GL's window space is defined so that integer coordinates represent the corners of the pixels, the pixel centers lie at half integer coordinates. Your win variable will represent the lower left corner of said pixel, but the more useful convention would be to use the pixel center, so you'd better add an offset of (0.5f, 0.5f, 0.0f) to win, assuming you point to the pixel center. (We can do a bit better if the virtual screen coords are higher resolution than our pixels, which means we already get a sub-pixel position for the mouse cursor, but the math won't change, because we have still to switch to the GL's convent where integer means border instead of integer means center). Note that since we now consider a space which is going from [0,w) in x and [0,h) in y, this also affects point 1. If you click at pixel (0,0), it will have the center (0.5, 0.5), and the y flipping should be h-y so h-0.5 (which should be rounded down towards h-1 when accessing the framebuffer pixel).

To put it all together, you could do (conceptually):

glfwGetWindowSize(win, &screen_w, &screen_h); // better use the callback and cache the values 
glfwGetFramebufferSize(win, &pixel_w, &pixel_h); // better use the callback and cache the values 
glfwGetCursorPos(window, &xpos, &ypos);
glm::vec2 screen_pos=glm::vec2(xpos, ypos);
glm::vec2 pixel_pos=screen_pos * glm::vec2(pixel_w, pixel_h) / glm::vec2(screen_w, screen_h); // note: not necessarily integer
pixel_pos = pixel_pos + glm::vec2(0.5f, 0.5f); // shift to GL's center convention
glm::vec3 win=glm::vec3(pixel_pos., pixel_h-pixel_pos.y, 0.0f);
glReadPixels( (GLint)win.x, (GLint)win.y, ..., &win.z)
// ... unproject win

what model matrix should I multuply by camera view matrix to form ModelView matrix?

None. The basic coordinate transformation pipeline is

object space -> {MODEL} -> World Space -> {VIEW} -> Eye Space -> {PROJ} -> Clip Space -> {perspective divide} -> NDC -> {Viewport/DepthRange} -> Window Space

There is no model matrix influencing the way from world to window space, hence inverting it will also not depend on any model matrix either.

that as said in the glm docs (https://glm.g-truc.net/0.9.8/api/a00169.html#ga82a558de3ce42cbeed0f6ec292a4e1b3) the result is returned in object coordinates.

The math doesn't care about which spaces you transform between. The documentation mentions object space, and the function uses an argument named modelView, but what matrix you put there is totally irrelevant. Putting just view there will be fine.

So in order to convert screen to world coordinates I should use a transform matrix from one mesh.

Well, you could even do that. You could use any model matrix of any object, as long as the matrix isn't singular, and as long as you use the same matrix for the unproject as you later use for going from object space to world space. You can even make up a random matrix, if you make sure it is regular. (Well, there might be numerical issues if the matrix is ill-conditioned). The key thing here is that when you specify (V*M) and P as the matrices for glm::unproject, it will internally calculate (V*M)^-1 * P^-1 * ndc_pos which is M^-1 * V^-1 & P^-1 * ndc_pos. If you transform the result back from object space to world space, you multiply that by M again, resulting in M * M^-1 * V^-1 & P^-1 * ndc_pos, which is of course just V^-1 & P^-1 * ndc_pos which you would directly have gotten if you didn't put M into the unproject in the first place. You just added more computational work, and introduced more potential for numerical issues...

answered Oct 18 '22 11:10

derhass

Related questions
                            
                                OpenCV - how to create Mat from uint8_t pointer
                            
                                Recursively visiting an `std::variant` using lambdas and fixed-point combinators
                            
                                What is the C# equivalent of C++ DWORD?
                            
                                Inherited constructors, default constructor and visibility
                            
                                Is "template argument deduction for class templates" supposed to deduce empty parameter packs for variadic class templates?
                            
                                Bitwise AND with function returning bool in C++
                            
                                Using CRTP with an interface
                            
                                How to use std::vector as the type of key for an std::unordered_map in C++?
                            
                                WINAPI C/C++ -> why did the binary size increase dramatically ? (Switch from VS2013 to VS 2015)
                            
                                I just can not understand DR 712
                            
                                Is it necessary to clean up stack contents?
                            
                                Why is this C++ code execution so slow compared to java?
                            
                                What exactly is the meaning of the footnote mentioned in [expr.ref]/1?
                            
                                Windows: Event-based Overlapped IO vs IO Completion Ports, Real World Performance
                            
                                Does std::vector::emplace() really offer the strong exception guarantee in the face of a throwing move constructor/assignment operator?
                            
                                Does std::map assign its comparator?
                            
                                Performance of std::partial_sort() versus std::sort() when sorting the whole range?
                            
                                C++ float vs double cout setprecision oddities(newbie)
                            
                                C++ - Use enum from template class without template parameter
                            
                                Shortest way to obtain a sublist of a sorted list of values lying in a certain interval

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With