A crude (but simple) approach to camera handling

Rotating the camera

At this stage, our renderer has no explicit concept of a camera: depth (the \(z\)-coordinate) does not influence the size or shape of objects, and is largely ignored. We do use the \(z\)-coordinate in the \(z\)-buffer to hide occluded surfaces, but this is a minor detail. Ignoring depth leads to orthographic projection, where objects retain their dimensions regardless of distance from the viewer. As before, our renderer outputs two images: zbuffer.tga and framebuffer.tga.

To simulate a camera, we transform the scene rather than moving the camera itself. For example, instead of rotating the camera to the left, we rotate the scene to the right. This keeps the camera fixed conceptually, and shifts the viewpoint via model-view transformations. This method simplifies camera handling and offers intuitive control over the visible scene.

See this commit:

Rotating the Object

vec3 rot(vec3 v) {
    constexpr double a = M_PI/6;
    constexpr mat<3,3> Ry = {{{std::cos(a), 0, std::sin(a)}, {0,1,0}, {-std::sin(a), 0, std::cos(a)}}};
    return Ry*v;
}

std::tuple<int,int,int> project(vec3 v) { // First of all, (x,y) is an orthogonal projection of the vector (x,y,z).
    return { (v.x + 1.) *  width/2,       // Second, since the input models are scaled to have fit in the [-1,1]^3 world coordinates,
             (v.y + 1.) * height/2,       // we want to shift the vector (x,y) and then scale it to span the entire screen.
             (v.z + 1.) *   255./2 };
}

int main(int argc, char** argv) {
    if (argc != 2) {
        std::cerr << "Usage: " << argv[0] << " obj/model.obj" << std::endl;
        return 1;
    }

    Model model(argv[1]);
    TGAImage framebuffer(width, height, TGAImage::RGB);
    TGAImage     zbuffer(width, height, TGAImage::GRAYSCALE);

    for (int i=0; i<model.nfaces(); i++) { // iterate through all triangles
        auto [ax, ay, az] = project(rot(model.vert(i, 0)));
        auto [bx, by, bz] = project(rot(model.vert(i, 1)));
        auto [cx, cy, cz] = project(rot(model.vert(i, 2)));
        TGAColor rnd;
        for (int c=0; c<3; c++) rnd[c] = std::rand()%255;
        triangle(ax, ay, az, bx, by, bz, cx, cy, cz, zbuffer, framebuffer, rnd);
    }

    framebuffer.write_tga_file("framebuffer.tga");
    zbuffer.write_tga_file("zbuffer.tga");
    return 0;
}

In this example, I apply the function vec3 rot(vec3 v) (lines 1–5) to each model vertex (lines 24–26) before projection. This rotates each vertex by \(30^\circ\) around the \(y\)-axis to the right, giving the impression of rotating the camera to the left:

Here, a constant rotation matrix is used. Thanks to the previous homework, we now have basic vector and matrix operations available. If you're not familiar with rotation matrices, don't worry - we'll soon explore alternative model-view transformations. However, be sure to review basic vector math, as it will be essential.

Central projection

Orthographic projection is useful, but central (or perspective) projection offers more realism: closer objects appear larger than distant ones.

Consider projecting a point \(P = (x, y, z)\) onto the plane \(z=0\) using a camera located at \(C = (0, 0, c)\) on the \(z\)-axis:

The projection point \(P'\) lies at the intersection of line \(CP\) and the screen plane \(z=0\). Given \(P = (x, y, z)\) and camera parameter \(c\), we want to find \(P' = (x', y', 0)\).

Let’s first compute \(x'\) using the plane \(y = 0\):

Using the intercept theorem:

\[ \frac{x'}{c} = \frac{x}{c - z} \quad \Rightarrow \quad x' = \frac{x}{1 - \frac{z}{c}}. \]

Similarly, in the plane \(x = 0\):

\[ \frac{y'}{c} = \frac{y}{c - z} \quad \Rightarrow \quad y' = \frac{y}{1 - \frac{z}{c}}. \]

As expected, both expressions are structurally similar. So, just like rotation, we implement perspective projection by transforming vertices: we replace each vertex \((x, y, z)\) with \((x, y, z) \cdot \frac{1}{1 - \frac{z}{c}}\). We then apply orthographic projection as usual. See lines 7–10 and 29–31:

Central Projection

vec3 rot(vec3 v) {
    constexpr double a = M_PI/6;
    constexpr mat<3,3> Ry = {{{std::cos(a), 0, std::sin(a)}, {0,1,0}, {-std::sin(a), 0, std::cos(a)}}};
    return Ry*v;
}

vec3 persp(vec3 v) {
    constexpr double c = 3.;
    return v / (1-v.z/c);
}

std::tuple<int,int,int> project(vec3 v) { // First of all, (x,y) is an orthogonal projection of the vector (x,y,z).
    return { (v.x + 1.) *  width/2,       // Second, since the input models are scaled to have fit in the [-1,1]^3 world coordinates,
             (v.y + 1.) * height/2,       // we want to shift the vector (x,y) and then scale it to span the entire screen.
             (v.z + 1.) *   255./2 };
}

int main(int argc, char** argv) {
    if (argc != 2) {
        std::cerr << "Usage: " << argv[0] << " obj/model.obj" << std::endl;
        return 1;
    }

    Model model(argv[1]);
    TGAImage framebuffer(width, height, TGAImage::RGB);
    TGAImage     zbuffer(width, height, TGAImage::GRAYSCALE);

    for (int i=0; i<model.nfaces(); i++) { // iterate through all triangles
        auto [ax, ay, az] = project(persp(rot(model.vert(i, 0))));
        auto [bx, by, bz] = project(persp(rot(model.vert(i, 1))));
        auto [cx, cy, cz] = project(persp(rot(model.vert(i, 2))));
        TGAColor rnd;
        for (int c=0; c<3; c++) rnd[c] = std::rand()%255;
        triangle(ax, ay, az, bx, by, bz, cx, cy, cz, zbuffer, framebuffer, rnd);
    }

    framebuffer.write_tga_file("framebuffer.tga");
    zbuffer.write_tga_file("zbuffer.tga");
    return 0;
}

Here is the resulting image (see commit):

Personally I find it much more convincing than the previous one made with an orthographic projection.

If this is unclear, consider the following 2D example. Suppose we have a polygon with vertices: \((2,0)\), \((0,2)\), \((-2, 2)\), \((-2,-2)\), and \((2,-2)\):

The camera is at \(C = (10, 0)\). We want to project all vertices onto the green line. Each vertex \((x, y)\) is transformed to:

\[ \frac{(x, y)}{1 - \frac{x}{10}}. \]

New coordinates:

\[ \begin{array}{l} \frac{(2, 0)}{1 - \frac{2}{10}} = \left(\frac{5}{2}, 0\right) \\ \frac{(0, 2)}{1 - 0} = (0, 2) \\ \frac{(-2, 2)}{1 + \frac{2}{10}} = \left(-\frac{5}{3}, \frac{5}{3}\right) \\ \frac{(-2, -2)}{1 + \frac{2}{10}} = \left(-\frac{5}{3}, -\frac{5}{3}\right) \\ \frac{(2, -2)}{1 - \frac{2}{10}} = \left(\frac{5}{2}, -\frac{5}{2}\right) \end{array} \]

Here is the deformed object:

We can now apply an orthographic projection to these transformed points to get the desired central projection.

Homework assignment: find the bug

Everything works well - except with the Diablo model. The hand closest to the viewer is somewhat clipped:

Your task is to understand why, and how to fix it.

Spoiler Alert!

The \(z\)-buffer is stored as an 8-bit grayscale image.

Spoiler Alert #2!

The model was assumed to fit within the cube \([-1, 1]^3\), but this doesn't hold after rotation.

Spoiler Alert #3!

The hand's depth exceeds 1, resulting in values >255 after viewport transformation, causing integer overflow. While storing the \(z\)-buffer as an image is helpful for debugging, it’s better to use a 2D array of floating-point values. To visualize it, write a few lines of code to convert the depth array into an image.

A crude (but simple) approach to camera handling

Rotating the camera

Central projection

Homework assignment: find the bug

Comments