"Christmas - the time to fix the computers of your loved ones" « Lord Wyrm

What is a deferred renderer?

JC 21.01.2003 - 17:07 521 1
Posts

JC

Administrator
Disruptor
Avatar
Registered: Feb 2001
Location: Katratzi
Posts: 9067
Nachdem viele offenbar nicht so richtig Bescheid wissen, habe ich mal Google angeworfen und bin fündig geworden. Der folgende Text erklärt imho ausgezeichnet, was ein DR ist und welche Vor- bzw. Nachteile dieser hat.
Wie aus dem Text auch eindeutig hervorgeht, sind die erwähnten "SGI-like"-Renderer Immediate Renderer.

[quote]Stellar Graphics, Raycer, GigaPixel, PowerVR, Trident (and at least two other companies which I cannot mention because of NDA agreements) have all embraced the idea of a chunk based, or deferred rendering scheme. The idea is that all the triangles for a scene are first stored in an array (before any pixels are drawn) and then the screen is decomposed into a grid of rectangular chunks that are then sequentially rendered from top to bottom.

What is the advantage of this? Primarily it allows reuse of the Z-buffer allocated to each chunk. In fact, to get the most out of this, the Z-buffer should simply be part of a chip local memory arrary (like a cache or SRAM -- Stellar's PixelSquirt architecture decomposes all the way down to scan lines and pixels, so they only need one entry Z-buffer, for each pixel pipeline which is typically 1 or 2 wide.) So the Z-buffer memory bandwidth and data occupation magically turns to zero. As an added bonus the output memory bandwidth becomes proportional to the size of the screen which you would expect to be less than the SGI-like architecture because there is no overdraw. Keeping this in mind, full screen super-sampling (otherwise known as full screen anti-aliasing) becomes a relatively minor performance hit.

But of course, it gets better. Since the memory bandwith has been so radically reduced, the overall performance becomes more related to the architecture of the graphics controller, than the memory bandwidth. This gives better scalability via ordinary frequency increases in the controller itself.

Isn't that cool? So what are the disadvantages? Well, it requires that Z-buffered triangles are the only drawing primitive used. With the SGI-like architecture you can play tricky games with non-Z buffered 2D decals rendering in mid-scene, although there is no evidence that software writers currently do or will exploit such an idea. There are also issues with rendering order and alpha blending.

The defferal time is necessarily long and thus you may end up in situations of under utilitzation of the graphics hardware followed by high utilization in ways that make it difficult to take maximal advantage of parallelization without introducing visual latencies (by as much as two frame flips.)

The potentially random and unbounded number of triangles means that specific graphics hardware will not be able handle all situations by itself. This pretty much requires CPU assistance, in ways not required by the SGI-like architecture.

There is a sorting algorithm required for quick chunking of trivial sub-rectangles (this is especially important for correct alpha blending as well). Again, because the number of triangles may grow arbitrarily, some sort of CPU assistance will be required. (But it has given software engineers a new opportunity to try to flex their brain muscles; a flurry of patents have been filed for this alone.)

I made mention of a possible "wait for retrace" optimization available to defferred renderers above. While I don't know of a specific architecture that has actually addressed this (though I very strongly believe that Stellar's PixelSquirt has done this), my opinion is that because of the deterministic rendering order, there is an opportunity for the rendering to "chase the scan beam". That is, rendering into the front buffer can start before the beam enters "vertical retrace" state, if a back buffer is already complete so long as it is arbitrated with the current scan beam location, before a flip to back buffer is performed. Since the pixel rendering can be ordered with respect to the scan line, that means that front buffer can be incrementally rendered to before it is even finished displaying. Its not quite as perfect as triple buffering, i.e., in the worst case there is still stalling but only if the graphics accelerator rendering rate is well ahead of the screen refresh rate.

Employees from 3Dfx have been observed saying things like "We hope everyone else in the industry moves to chunk based rendering". Such a comment may be backed by the fact that the host assistance required for deferred rendering, while theoretically could be small, it cannot easily be removed. So in terms of a maximal ceiling, a sufficiently high performance SGI-like solution (i.e., if the memory bandwidth problem were "solved" in some other way) could, in theory, beat all defferred rendering implementations. (There is at least one company, that again, I cannot reveal because of NDA, that is working on a highly aggressive memory architecture that may indeed make the deferred rendering architecture impotent.)

ATI has appeared to have declared their position on the situation. The Rage128 architecture is a SGI-like architecture that has attacked the memory bandwidth problem by attaching a frame buffer cache (for the accelerator, not for host communications over the PCI bus). Of course the biggest advantage it offers is to address the bandwidth latency of redundant accesses, which are certainly not trivial (think about rendering a 3D object made of many triangles.) Taken to the extreme, this sort of approach can match the bulk of the advantages of the deferred rendering technique (defferred rendering still has better trivial reject possibilities, and can attack the wait for refresh problem in ways still not available to the SGI-like architecture).

The cost of this architecture is very high because there are new stages, and new problems that need to be solved. This rendering scheme does not have nearly the momentum behind it that the conventional SGI-like architecture does.

The potential performance of the deferred rendering scheme is likely to totally shatter any kind of SGI-like architecture especially for increasing levels of content (such as Quake 3 or Max Payne). Such an architecture could, if the host can keep up, allow us to leave polygonal objects behind us, and start looking to Nurbs as the object rendering primitive of choice. It would also allow us to play our games at 1600x1200x32bpp.

Strange things like "texture renaming" may be necessary just to maintain compatibility with the SGI-like compatible content. (For example, suppose you load texture A, render from texture A, overwrite texture A with some other content and render from texture A once again -- the deferred renderer will need both copies of texture A, since the entire texel and texture list is required.)[/quote]
Source
Bearbeitet von JC am 21.01.2003, 17:15 (link edited)

Oculus

void
Avatar
Registered: Jun 2001
Location: schlafzimmer
Posts: 856
hmm... scho krass
geht zwar wieder in die, imho, falsche richtung, klingt aber krass
mir würd a hardware beschleunigter raytracer taugn
also a aktuelle raytracing-engine wie die von maya oder lightwave einfach "verdrahten"
wär sicher zu teuer :D


ich own grad vertex und pixel shader
was wieder zeigt, dass grafikprogrammierung heut nix mehr wert is.
kann absolut nix mehr
früher gabs noch die helden von id, die quake1 etc. gmacht ham
aber mit dx9 kann jeder trottl a 3d-engine bastln
ausserdem geht durch die standardisierung der schnittstellen so dermaßen performance drauf....
Kontakt | Unser Forum | Über overclockers.at | Impressum | Datenschutz