Whenever most people say Fermi, they think immediately of Tessellation and with good reason. Tessellation involves taking simple geometry and making it more complex and during the process remove angular silouettes on game objects. Smoothing out varying levels of detail is also an added bonus – this means as you approach an object, extra detail can be added in the most realistic manner by the game developer without jarring intrusions ruining the immersive experience. Additionally developers can create geometrically enhanced surfaces reducing the requirement to use bump maps and parallax occlusion maps to fool the naked eye into thinking a surface is textured and rough, rather than flat and smooth.
Obviously with Tessellation the demands are higher on a graphics core and while both nVidia and ATI have increased the power of their pixel shading hardware there has been not so much focus on vertex, geometry based improvements.
This brings us to the biggest difference between both ATI and nVidia hardware – their implementation of Tessellation. ATi use a tesselator which feeds two rasteriser units and nVidia’s Fermi design has 16 tesselator units with four rasterisers.
The nVidia solution also has the world’s first Scalable Geometry Architecture, they state that the GTX 480 has up to 8 times faster geometry performance than the last generation single design card leader, the GTX 285. Shared memory triples from 16kb to 48kb and there is an added 16kb of Level 1 Cache.
Above you can see the nVidia Fermi architecture. The GigaThread engine feeds the data across four Graphics Processing Clusters and these each have a raster engine as well as four stream processor clusters and four Polymorph Engines. Raster and Polymorph engines are new and the implementation of these basically forms the foundation of Fermi. Tessellation needs to feed data back into the front end which means if it used the same unit as before (handling vertex fetching, rasterisation and triangle assembly) it should lead to data stall.
Fermi can process two kernels together as it is a Multiple Instruction, Multiple Data design. If data has to be processed then it can feed different streams simultaneously without causing delays in data processing. In theory, real world benefits include GPU AI and Fermi processing combined with traditional rasterisation without any hitching.