High-Quality Antialiased Rasterization Source Code

Introduction
Top-level Control Loop
Downsample and Accumulate
Extensions
Gpu Library
Utility Library

Introduction

This source code implements the concepts discussed in Chapter 21 of GPU Gems II. The code is a slightly modified version of the actual source code used in NVIDIA Gelato, a film-quality renderer. More information on Gelato can be found at the NVIDIA Film Group website.

The concepts are implemented in a command-line (console) program which can render Gelato grid dump files, a number of which are included as examples. The code works under both Windows XP and Linux. The program can render directly to an onscreen window, or to a PPM file. Each image is rendered as an array of tiles with an arbitrary super-sample resolution which are downsampled to the final resolution using an arbitrarily large filter kernel. The code is structured such that it should be relatively easy to replace the gridfile rendering with your own rendering code.

To compile the code under Windows XP using Visual Studio .NET:

Open the solution file hqaa/hqaa.vcproj

Build and run the example with Debug -> Start

The code will be compiled and the killeroo example scene (courtesy Headus) will be rendered to an onscreen window. You can also compile the Debug solution if you want to enable a host of debugging code in the Gpu library. This slows down rendering significantly. Note that the debug mode automatically defines the pre-processing directive DEBUG which is used within the code to enable extra checks and assertions (see dassert.h).

To compile the code under Linux:

% cd top-level-directory

% make

To remove all build objects and executables, use "make clean". To compile the code with lots of slow debugging checks enabled, use "make DEBUG=1".

The scenes directory contains a number of example scenes that can be rendered with the test program, including:

grids.pyg simplified killeroo (courtesy Headus) model
aatest.pyg useful for testing standard aliasing issues

Additional utilities are provided to help create new scenes using gelato, which is a free download. To create new scenes, you can use Maya and gelato's Mango plugin to create PYG or RIB files which can then be converted to gridfiles for use in hqaa.

rendergrids.pyg is a gelato scene file that can be used to render a grid dump file in gelato. Use by changing the Input() file and rendering with:
% gelato onetile.pyg -iv

griddump.pyg is a gelato scene file that can be included to generate griddump pyg files for use in hqaa. Make sure to put griddump.pyg as the first scene file on the gelato command line.

Top-level Control Loop

The code in hqaa.cpp contains command-line argument processing and the top-level control loop. It relies on two application classes in downsample.cpp and accumulate.cpp which perform the 2D downsample and accumulation directly on the GPU. All of these files rely on various utilities in the utility library and the gpu library which is a high-level C++ wrapper around common OpenGL functions along with nice state management and debugging features.

The top level control loop handles the argument processing using the utility library's ArgParse class. You can set any of the input parameters and choose the name of the output file. Here is the usage message:

       scenefile          Input scene file
       -camera x y z      Camera location
       -lookat x y z      Camera direction
       -fov angle         Camera field of view
       -nearfar near far  Camera clipping planes
       -resolution x y    Image resolution in pixels
       -bucketsize x y    Bucket size in pixels
       -supersample x y   Super-sample resolution
       -filter name x y   Filter name and radii in pixels
       -bitdepth b        32, 16, or 8 bits per channel
       -o filename        Output image filename

If you don't specify an output filename, the image is rendered directly to an onscreen window.

Pseudo-code for the main loop:

construct camera matrices and initialize tile drawing surface
for each tile row:
  for each tile column:
    compute the offset matrix for this tile's view
    render the entire scene into super-sampled tile buffer
    downsample the super-sampled tile to final resolution
    accumulate the final-resolution tile into the output buffer

Note that when setting up the camera to tile matrix, the flipy option is used to display images in onscreen windows flipped with respect to the compute order used when outputting scanlines to a file.

The following variables are used below:

tx, ty Tile Size in final image pixels

ssx, ssy Supersamples per final pixel

fx, fy Filter radius in final image pixels

During computation, the application allocates the following rendering surfaces:

tx*ssx, ty*ssy Tile rendering buffer

tx+2*fx, ssy*(ty+2*fx) Downsample intermediate buffer

resx, resy Accumulate (when rendering to window)

resx, ty+2*fx Accumulate (when rendering to a file)

These limits must fall below the GPU-imposed limits, or buffer allocation will fail. When outputting to a file, the image is computed one tile strip at a time to reduce memory requirements.

Downsample and Accumulate

The GpuDownsample class handles the downsampling and filtering of the tile rendered at super-sample resolution returning a texture that contains a high-quality version of the final tile padded by the filter radius. The work is performed using an intermediate pbuffer to render the two separable filtering passes. The final result is left on the GPU in the form of a usable texture.

GpuDownsample Constructor, no required arguments

~GpuDownsample Destructor, frees any allocated resources

GpuDownsample::tile Downsamples a tile stored in the passed-in texture with the specified dimensions and filter and returns the result in a GpuTexture reference.

The GpuAccumulate class handles the accumulation of the final downsampled and filtered tiles into the final image buffer. It can work in two different modes: whole image or tile strips. The accumulate class is started either once for the entire image using begin_image() or once for each strip of tiles using begin_strip(). Strips are assumed to be computed from the origin to the top of the image.

GpuAccumulate Constructor, no required arguments

~GpuAccumulate Destructor, frees any allocated resources

GpuAccumulate::begin_strip Call when computing the image in strips before beginning a new strip of tiles. Note that it is assumed that strips are computed starting at the origin and moving up to the height of the image.

GpuAccumulate::begin_image Call when compute the image with a single accumulation buffer or rendering directly to a window. Since this allocates a larger buffer than when rendering in strips, this rendering mode may not work with large resolutions.

GpuAccumulate::tile Pass in a downsampled, padded tile for accumulation

GpuAccumulate::end Call after rendering a tile strip or the entire image to retrieve the accumulated texture buffer. When rendering to a window, this will also enable a simple event loop to repaint the window and wait for the Escape key.

Extensions

This code can be easily extended or optimized. Here are a few ideas:

Arbitrarily wide images. Extend the GpuAccumulate class to handle non-zero x origins and use a list of GpuAccumulate objects to tile strips wider than the maximum pbuffer width.

Transparency. Implement transparent surface support by adding either a sorting pass followed by back-to-front rendering, or by using depth peeling.

Bucketed geometry. Instead of rendering the entire list of GpuPrimitive objects for each tile, restrict rendering to only those primitives whose bounding box intersects the active tile.

Render to texture. Under Windows, the render-to-texture functions can be used to skip copying framebuffers into texture in both the Downsample and Accumulate classes. See the code blocks in the Gpu library that are bracked by #ifdef RTT for some helpful routines.

Multithreading. Render tiles in separate threads. Start by making the Gpu library threadsafe, and then extend the tiled rendering loop to render each tile in a separate thread.

glDrawElements and Vertex Buffer Objects. Extend the Gpu library to render quadmeshes with a single glDrawElements call using the VBO extension for additional performance when re-rendering meshes for each tile.

Skip empty buckets. There is no need to downsample empty buckets, or with a 1x1 box filter.

Extend to non-NVIDIA GPUs. The current code only works on NV30-class hardware due to the use of NVIDIA-specific vertex and fragment program code and certain extensions including occlusion query and texture rectangles. Switch this platform- dependent code over to OpenGL ARB extensions.

Gpu Library

The Gpu library is a C++ wrapper around high-level OpenGL graphics functions. It is designed to make it easy to create on- or off-screen buffers for doing computational work with the GPU. It provides a stateful environment that is designed to facilitate debugging complex GPGPU applications. It works under both Windows and Linux to provide a platform-independent GPU API.

Texture objects make it easy to define and use textures from CPU-side data and directly from framebuffers. Since render-to-texture is not supported under Linux currently, the Gpu library only supports copy-fb-to-texture. All textures are treated as single-level, rectangular textures with no support for mip-mapping.

For more details, please see the gpu.h header file.

Simplest way to draw a 2D rectangle:

    GpuPBuffer pbuffer (256, 256);
    GpuCanvas canvas (pbuffer);
    GpuDrawmode drawmode ();
    GpuPrimitive rect (xmin, xmax, ymin, ymax);
    rect.render (drawmode, canvas);

Create a 2x2 texture and bind it to texture unit 0:

    GpuTexture texture ("my texture");
    Vector3 color[4] = {{1,0,0}, {0,1,0}, {0,0,1}, {1,1,1}};
    texture.load (&color, 2, 2);
    drawmode.texture (0, &texture);

Make the drawmode 3D drawing mode and create and draw a quadmesh:

    Matrix4 c2s = Matrix4::PerspectiveMatrix (45, 1, 0.01, 10000);
    drawmode.view (&c2s, 256, 256);
    Vector3 P[4] = {{0,0,0}, {1,0,0}, {1,1,0}, {0,1,0}};
    GpuQuadmesh quadmesh (2, 2, P)
    Vector3 texcoord[4] = {{1,0,0}, {0,1,0}, {0,0,1}, {1,1,1}};
    quadmesh.texcoord (0, texcoord);
    quadmesh.render (drawmode, canvas);

Creating and using a fragment program with a constant parameter:

    const char *fp10 = "!!FP1.0\nMOVR o[COLR], p[0];\nEND\n";
    GpuFragmentProgram fp ("red", fp10);
    fp.parameter (0, Vector4(1,0,0,0));
    drawmode.fragment_program (&fp);

An example of how to do an occlusion query with multiple draw statements:

    GpuOcclusionQuery oq ("depth peel");
    canvas.begin_occlusion_query (oq);
    quadmesh.render (drawmode, canvas);
    ...
    quadmesh.render (drawmode, canvas);
    canvas.end_occlusion_query (oq);
    ... < do something to hide latency > ...
    printf ("occlusion query had %d visible fragments\n", oq.count());

An example of how copy-from-fb-to-texture works:

    GpuTexture fromfb ("rendered texture");
    fromfb.load (canvas, 0, 0, 256, 256);
    drawmode.texture (0, &fromfb);

Utility Library

The utility library contains a number of useful classes:

vecmat.h a basic set of 3- and 4-tuple vector and 4x4 matrix, and 3D bounding box CPU functions. The structures in the library are guaranteed to be the same as an array of floats. Basic vector and matrix operations are overloaded and a handful of useful utility functions are provided.

color.h Common color space operations similar to the Vector3 class.

filter.h A set of common 2D filters including box, triangle, gaussian, catmull-rom, blackman-harris, sinc, mitchell, disk

peakcounter.h A simple class to help track resource usage

argparse.h Parses standard command line arguments using strings similar to printf. Based on Paul Heckbert's command line parsing utilities.

ppm.h A basic implementation of PPM image file output.

dassert.h Simple assert-like wrapper that can be used in either release or debug builds

gelendian.h Utilities for handling endian-ness

Last modified: Tue Dec 28 09:14:24 Pacific Standard Time 2004

`tx, ty`	Tile Size in final image pixels
`ssx, ssy`	Supersamples per final pixel
`fx, fy`	Filter radius in final image pixels

`txssx, tyssy`	Tile rendering buffer
`tx+2fx, ssy(ty+2*fx)`	Downsample intermediate buffer
`resx, resy`	Accumulate (when rendering to window)
`resx, ty+2*fx`	Accumulate (when rendering to a file)

`GpuDownsample`	Constructor, no required arguments
`~GpuDownsample`	Destructor, frees any allocated resources
`GpuDownsample::tile`	Downsamples a tile stored in the passed-in texture with the specified dimensions and filter and returns the result in a GpuTexture reference.

`GpuAccumulate`	Constructor, no required arguments
`~GpuAccumulate`	Destructor, frees any allocated resources
`GpuAccumulate::begin_strip`	Call when computing the image in strips before beginning a new strip of tiles. Note that it is assumed that strips are computed starting at the origin and moving up to the height of the image.
`GpuAccumulate::begin_image`	Call when compute the image with a single accumulation buffer or rendering directly to a window. Since this allocates a larger buffer than when rendering in strips, this rendering mode may not work with large resolutions.
`GpuAccumulate::tile`	Pass in a downsampled, padded tile for accumulation
`GpuAccumulate::end`	Call after rendering a tile strip or the entire image to retrieve the accumulated texture buffer. When rendering to a window, this will also enable a simple event loop to repaint the window and wait for the Escape key.

`vecmat.h`	a basic set of 3- and 4-tuple vector and 4x4 matrix, and 3D bounding box CPU functions. The structures in the library are guaranteed to be the same as an array of floats. Basic vector and matrix operations are overloaded and a handful of useful utility functions are provided.
`color.h`	Common color space operations similar to the `Vector3` class.
`filter.h`	A set of common 2D filters including box, triangle, gaussian, catmull-rom, blackman-harris, sinc, mitchell, disk
`peakcounter.h`	A simple class to help track resource usage
`argparse.h`	Parses standard command line arguments using strings similar to printf. Based on Paul Heckbert's command line parsing utilities.
`ppm.h`	A basic implementation of PPM image file output.
`dassert.h`	Simple assert-like wrapper that can be used in either release or debug builds
`gelendian.h`	Utilities for handling endian-ness