Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

3d point reconstruction from 2d Images

I am trying understand basics of 3d point reconstruction from 2d stereo images. What I have understood so far can be summarized as below:

For 3d point (depth map) reconstruction, we need 2 images of the same object from 2 different view, given such image pair we also need Camera matrix (say P1, P2)

  • We find the corresponding points in the two images using methods like SIFT or SURF etc.

  • After getting corresponding key point, we find find the essential matrix (say K) using minimum 8 key points (used in 8-point algorithm)

  • Given we are at camera 1, calculate the parameters for camera 2 Using the essential matrix returns 4 possible camera parameters

  • Eventually we use corresponding points and both camera parameters for 3d point estimation using triangulation method.

After going through theory section, as my first experiment I tried to run the code available here, Which worked as expected. With a few modification in the example.py code I tried to run this example on all the consecutive image pairs and merge the 3-d point clouds for 3d reconstruction of object (dino) as below:

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import cv2

from camera import Camera
import structure
import processor
import features

def dino():
    # Dino
    img1 = cv2.imread('imgs/dinos/viff.003.ppm')
    img2 = cv2.imread('imgs/dinos/viff.001.ppm')
    pts1, pts2 = features.find_correspondence_points(img1, img2)
    points1 = processor.cart2hom(pts1)
    points2 = processor.cart2hom(pts2)

    fig, ax = plt.subplots(1, 2)
    ax[0].autoscale_view('tight')
    ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))
    ax[0].plot(points1[0], points1[1], 'r.')
    ax[1].autoscale_view('tight')
    ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))
    ax[1].plot(points2[0], points2[1], 'r.')
    fig.show()

    height, width, ch = img1.shape
    intrinsic = np.array([  # for dino
        [2360, 0, width / 2],
        [0, 2360, height / 2],
        [0, 0, 1]])

    return points1, points2, intrinsic


points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)

for item in range(len-1):
    print(files[item], files[(item+1)%len])
    #dino() function takes 2 images as input
    #and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.
    points1, points2, intrinsic = dino(files[item], files[(item+1)%len])
    #print(('Length', len(points1))
    # Calculate essential matrix with 2d points.
    # Result will be up to a scale
    # First, normalize points
    points1n = np.dot(np.linalg.inv(intrinsic), points1)
    points2n = np.dot(np.linalg.inv(intrinsic), points2)
    E = structure.compute_essential_normalized(points1n, points2n)
    print('Computed essential matrix:', (-E / E[0][1]))

    # Given we are at camera 1, calculate the parameters for camera 2
    # Using the essential matrix returns 4 possible camera paramters
    P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])
    P2s = structure.compute_P_from_essential(E)

    ind = -1
    for i, P2 in enumerate(P2s):
        # Find the correct camera parameters
        d1 = structure.reconstruct_one_point(
            points1n[:, 0], points2n[:, 0], P1, P2)

        # Convert P2 from camera view to world view
        P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))
        d2 = np.dot(P2_homogenous[:3, :4], d1)

        if d1[2] > 0 and d2[2] > 0:
            ind = i

    P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]
    #tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)
    tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)

    if not points3d.size:
        points3d = tripoints3d
    else:
        points3d = np.concatenate((points3d, tripoints3d), 1)


fig = plt.figure()
fig.suptitle('3D reconstructed', fontsize=16)
ax = fig.gca(projection='3d')
ax.plot(points3d[0], points3d[1], points3d[2], 'b.')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
ax.view_init(elev=135, azim=90)
plt.show()

But I am getting very unexpected result. Please suggest me if above method is correct or how can i merge multiple 3d point clouds to construct a single 3-d structure.

like image 279
flamelite Avatar asked Oct 26 '18 13:10

flamelite


People also ask

What is image based 3D reconstruction?

3D reconstruction from multiple images is the creation of three-dimensional models from a set of images. It is the reverse process of obtaining 2D images from 3D scenes.

What is 3D scene reconstruction?

In computer vision and computer graphics, 3D reconstruction is the process of capturing the shape and appearance of real objects. This process can be accomplished either by active or passive methods. If the model is allowed to change its shape in time, this is referred to as non-rigid or spatio-temporal reconstruction.

Why 3D reconstruction is important?

Three-dimensional object reconstruction Reconstruction allows us to gain insight into qualitative features of the object which cannot be deduced from a single plane of sight, such as volume and the object relative position to others in the scene.


1 Answers

Another possible path of understanding for you would be to look at an open source implementation of structure from motion or SLAM. Note that these systems can become quite complicated. However, OpenSfM is written in Python and I think it is easy to navigate and understand. I often use it as a reference for my own work.

Just to give you a little more information to get started (if you choose to go down this path). Structure from motion is an algorithm for taking a collection of 2D images and creating a 3D model (point cloud) from them where it also solves for the position of each camera relative to that point cloud (i.e. all the returned camera poses are in the world frame and so is the point cloud).

The steps of OpenSfM at a high level:

  1. Read image exif for any prior information you can use (e.g. focal length)

  2. Extract feature points (e.g. SIFT)

  3. Match feature points

  4. Turn these feature point matches into tracks (e.g. if you saw a feature point in image 1,2, and 3, then you can connect that into a track instead of match(1,2), match(2,3), etc...)

  5. Incremental Reconstruction (note that there is also a global approach). This process will use the tracks to incrementally add images to the reconstruction, triangulate new points, and refine the poses/point positions using a process called Bundle Adjustment.

Hopefully that helps.

like image 76
Jomnipotent17 Avatar answered Sep 18 '22 19:09

Jomnipotent17