
John Carmack on id Tech 6, Ray Tracing, Consoles, Physics and more
Overview
Date: Mar 12, 2008
Original URL: http://www.pcper.com/article.php?aid=532
Synopsis: Transcription of Ryan Shrout's phone interview with John Carmack
In recent months a lot of discussion has been circulating about the roles of ray
tracing, accelerated physics and multiple-GPU configurations in the future of PC
and console gaming. Here at PC Perspective we have heard from Intel on the
subject many times and just recently sat down with NVIDIA's Chief Scientist
David Kirk to discuss the ray tracing and rasterization debate.
Many of our readers, as well as comments across the web, asked for feedback
from the developers. It makes sense - these are the people that are going to be
spending their money and time developing games to sell on next-generation
architecture so surely their opinions would be more grounded in reality than a
hardware company trying to push their technological advantages. With that in
mind, we spent some time talking with John Carmack, the legendary programmer at
id Software famous for Wolfenstein, Doom, Quake and the various engines that
power them. What started out as a simple Q&A about Intel's ray tracing
plans turned into a discussion on the future of gaming hardware, both PC and
console, possible software approaches to future rendering technology,
multiple-GPU and multi-core CPU systems and even a possible insight into id Tech
6, the engine that will replace the id Tech 5 / Rage title.
The information that John discussed with us is very in-depth and you'll
probably want to block off some time to fully digest the data. You might also
want to refresh your knowledge of octrees and voxels. Also note that in some
areas the language of this text might seem less refined than you might expect
simply because we are using a transcription of a recorded conversation.
Sections
- Ray tracing for more than rendering
- Hybrid rendering, graphics APIs and mobile ray tracing
- Multi-GPU graphics and Conclusions
Questions
Ray tracing for more than rendering
Ryan Shrout:
Let's just jump right into the issue at hand. What is your take on current ray
tracing arguments floating around such as those featured in a couple of
different articles here at PC Perspective? Have you been doing any work on ray
tracing yourself?
John Carmack:
I have my own personal hobby horse in this race and have some fairly firm
opinions on the way things are going right now. I think that ray tracing in the
classical sense, of analytically intersecting rays with conventionally defined
geometry, whether they be triangle meshes or higher order primitives, I'm not
really bullish on that taking over for primary rendering tasks which is
essentially what Intel is pushing. (Ed: information about Intel's research is
here.) There are large advantages to rasterization from a performance
standpoint and many of the things that they argue as far as using efficient
culling technologies to be able to avoid referencing a lot of geometry, those
are really bogus arguments because you could do similar things with occlusion
queries and conditional renders with rasterization. Head to head rasterization
is just a vastly more efficient use of whatever transistors you have
available.
But, I do think that there is a very strong possibility as we move towards next
generation technologies for a ray tracing architecture that uses a specific data
structure; rather than just taking triangles like everybody uses and tracing
rays against them and being really, really expensive. There is a specific
format I have done some research on that I am starting to ramp back up on for
some proof of concept work for next generation technologies. It involves ray
tracing into a sparse voxel octree which is essentially a geometric evolution of
the mega-texture technologies that we're doing today for uniquely texturing
entire worlds. It's clear that what we want to do in the following generation
is have unique geometry down to the equivalent of the texel across everything.
There are different approaches that you could wind up and try to get that done
that would involve tessellation and different levels of triangle meshes and you
can could conceivably make something like that work but rasterization
architecture does really start falling apart when your typical triangle size is
less than one pixel. At that point you really have lost much of the benefits of
rasterization. Not necessarily all of them, because linearly walking through a
list of primitives can still be much faster than randomly accessing them for
tracing, but the wins are diminishing there.
In our current game title we are looking at shipping on two DVDs, and we are
generating hundreds of gigs of data in our development before we work on
compressing it down. It's interesting that if you look at representing this
data in this particular sparse voxel octree format it winds up even being a more
efficient way to store the 2D data as well as the 3D geometry data, because you
don't have packing and bordering issues. So we have incredibly high numbers;
billions of triangles of data that you store in a very efficient manner. Now
what is different about this versus a conventional ray tracing architecture is
that it is a specialized data structure that you can ray trace into quite
efficiently and that data structure brings you some significant benefits that
you wouldn't get from a triangular structure. It would be 50 or 100 times more
data if you stored it out in a triangular mesh, which you couldn't actually do
in practice.
I've been pitching this idea to both NVIDIA and Intel and just everybody about
directions as we look toward next generation technologies. But this is one of
those aspects where changing the paradigm of rendering from rasterization based
approach to a ray casting approach or any other approach is not out of the
question but I do think that the direction that Intel is going about it as a
conventional ray tracer is unlikely to win out. While you could start doing
some real time things that look interesting its always going to be a matter of
a quarter the efficiency or a 10th of the efficiency or something like that.
Intel of course hopes that they can win by having 4x the raw processing power on
their Larrabee versus a conventional GPU, and as we look towards future
generations that's one aspect of how the battle may shape up. Intel has always
had process advantage over the GPU vendors and if they are able to have an
architecture that has 3-4x the clock rate of the traditional GPU architectures
they may be able to soak the significant software architecture deficit by
clubbing it with processing power.
From the developers stand point there are pros and cons to that. We could
certainly do interesting things with either direction. But literally just last
week I was doing a little bit of research work on these things. The direction
that everybody is looking at for next generation, both console and eventual
graphics card stuff, is a "sea of processors" model, typified by Larrabee or
enhanced CUDA and things like that, and everybody is sort of waving their hands
and talking about "oh we'll do wonderful things with all this" but there is very
little in the way of real proof-of-concept work going on. There's no one
showing the demo of like, here this is what games are going to look like on the
next generation when we have 10x more processing power - nothing compelling has
actually been demonstrated and everyone is busy making these multi-billion
dollar decisions about what things are going to be like 5 years from now in the
gaming world. I have a direction in mind with this but until everybody can
actually make movies of what this is going to be like at subscale speeds, it's
distressing to me that there is so much effort going on without anybody showing
exactly what the prize is that all of this is going to give us.
Ryan Shrout:
So, because Intel's current demonstrations are using technology from two
previous generations rather than showing off one or two generations AHEAD of
today, there is little exciting to be drawn from it?
John Carmack:
I wouldn't say there's anything that Intel has shown, even if they network a
whole room full of PCs and say "we'll be able to stick all of this on a graphics
card for you in the coming generation," I don't think they've shown the win. I
don't think they've shown something people will say "my god that's 10x cooler"
or "that makes me want to buy a new console".
It is tough in a research environment to do that because so much of the content
battle now is media rather than algorithms. They've certainly been hacking on
the Quake code bases to at least give them something that is not an ivory tower
toy, but they're working with something that is previous generation technology
and trying to make it look like something that is going to a next-gen
technology. You really can't stretch media over two generational gaps like
that, so they're stuck. Which is why I'm hoping to be able to do my part and
provide some proof of concept demo technology this year. We're working on our
RAGE project and the id Tech 5 code base but I've been talking to all the
relevant people about what we think might be going on and what our goals are for
an id Tech 6 generation. Which may very well involve, I'm certainly hoping it
involves, ray tracing in the "sparse voxel octree" because at least I think I
can show a real win. I think I can show something that you don't see in current
games today, or even in the current in-development worlds of unique surface
detail. By following that out into the extra dimension of having complete
geometric detail at that same density I think can provide something that
justifies the technological sea change.
Ryan Shrout:
How dramatic would a hardware change have to be to take advantage of the
structures you are discussing here?
John Carmack:
It's interesting in that the algorithms would be something that, it's almost
unfortunate in the aspect that these algorithms would take great advantage of
simpler bit-level operations in many cases and they would wind up being
implemented on this 32-bit floating point operation-based hardware. Hardware
designed specifically for sparse voxel ray casting would be much smaller and
simpler and faster than a general purpose solution but nobody in their right
mind would want to make a bet like that and want to build specific hardware for
technology that no one has developed content for. The idea would be that you
have to have a general purpose solution that can approach all sorts of things
and is at least capable of doing the algorithms necessary for this type of ray
tracing operation at a decent speed. I think it's pretty clear that that's
going to be there in the next generation. In fact, years and years ago I did an
implementation of this with complete software based stuff and it was
interesting; it was not competitive with what you could do with hardware, but
it's likely that I'll be able to put something together this year probably using
CUDA. If I can make something that renders a small window at a modest frame
rate and we can run around some geometrically intricate sparse voxel octree
world and make a 320x240 window at 10 fps and realize that on next-generation
hardware that's optimized more for doing this we can go ahead and get 1080p 60
Hz on there.
That would be the justification that would make everybody sleep a whole lot of
better that there is going to be some win coming out this.
Ryan Shrout:
Is AMD's tessellation engine that they put in the R600 chips anywhere close to
what you are looking for?
John Carmack: No, tessellation has been one of those things up there with procedural content generation where it's been five generations that we've been having people tell us it's going to be the next big thing and it never does turn out to be the case. I can go into long expositions about why that type of data amplification is not nearly as good as general data compression that gives you the data that you really want. But I don't think that's the world beater; I mean certainly you can do interesting things with displacement maps on top of conventional geometry with the tessellation engine, but you have lots of seaming problems and the editing architecture for it isn't nearly as obvious. What we want is something that you can carve up the world as continuously as you want without any respect to underlying geometry.
Hybrid rendering, graphics APIs and mobile ray tracing
Ryan Shrout:
Based on your new data structure method using ray tracing, could you couple this
with current rasterization methods for hybrid rendering?
John Carmack:
I saw the quote from Intel about making no sense for a hybrid approach, and I
disagree with that. I think that if you had basically a routine that ray traces
this area of the screen in the sparse voxel octree it's going to spit out
fragments, it's going to wind up having a depth value on there that you could
intermix with anything else. Even if you had a ray trace against a conventional
architecture you would still want to have a fragment program there that would
look almost exactly like current fragment programs that we've got right now. I
couldn't imagine wanting to do something that didn't have a back end like that.
I mean you might even have vertex processors - the stuff that Intel is doing
right now, ray tracing into the geometry, it's very likely that you would in the
end want to be able to run the triangles in there that you are ray tracing
against through vertex and fragment processors and you're just getting the
barycentric coordinate of your ray trace stab. You have to know what you hit
but then you have to know what you want to do there. You would want in addition
some ability to send dependent rays out from there as extra elements.
It's reasonably likely that if my little data structure direction pans out
you'll probably still want to do characters as skinned and boned with
traditional animation methods. While you could go ahead and work out a voxel
method of characters using refraction skeletons around characters and you could
do animation, you probably wouldn't want to because we can make characters that
look pretty damn good with the existing stuff and if everything continues to get
10x faster without us doing anything you'll probably want to do characters
conventionally. But if you can do the world and most of the static objects at
this incredible level of detail that you would get with the sparse voxel octree
approach that seems like a completely reasonable way to mix and match.
Now there are aspects that mix and matching would work poorly. It would be
nice to be able to solve the shadowing problem really directly by ray tracing.
You could do that in a completely ray traced world, you just send the shadow
rays out and jitter them and do all the nice things that let you solve the
aliasing problem nicely. But if you rasterized characters traditionally with
hardware skinning, the voxel ray tracer wouldn't find any intersections with the
characters, and they wouldn't cast any shadows. So there are down sides to that
but what I want to get out of ray tracing here is not a lot of what would be
considered the traditional benefits of ray tracing: perfect shadows - shadowing
would be damn nice to be solve but we can live without that - things like
refraction and multiple mirror bounces. Those just aren't that important and we
have every evidence in the world about that because in the real world where
people make production renderings, even if they have almost infinite resources
for movie budgets, very little of it is ray traced. There are spectacular off
line ray tracers but even when you have production companies that have rooms and
rooms of servers they choose not to use ray tracing very often because in the
vast majority of cases it doesn't matter. It doesn't matter for what they are
trying to do and it's not worth the extra cost. And that's going to stay fairly
similar throughout the next-generation gaming hardware models.
What I really want to get out of the ray tracing is this infinite geometry
which is more driven by the data structure that you have to use ray tracing to
access, rather than the fact that you're bouncing these multiple rays around. I
could do something next generation with this and I hope that it pans out that
way - we may not have dependent rays at all it and may just use ray tracing to
solve the geometry problem. Then you can also solve the aliasing problem by
stocastically jittering all the sample centers which is something that I've been
pushing to have integrated into current rasterization approaches. Its obvious
how you do it in a ray tracing approach; you jitter all the samples and you have
some dependent, refinement approach going on there.
I think that we can have huge benefits completely ignoring the traditional ray
tracing demos of "look at these shiny reflective curved surfaces that make three
bounces around and you can look up at yourself". That's neat but that's an
artifact shader, something that you look at one 10th of 1% of the time in a
game. And you can do a pretty damn good job of hacking that up just with a
bunch environment map effects. It won't be right, but it will look cool, and
that's all that really matters when you're looking at something like that. We
are not doing light transport simulation here, we are doing something that is
supposed to look good.
Ryan Shrout:
So current generation consoles and PC graphics cards aren't going to be capable
of running this new type of sparse voxel octree based technology? And do you
think vendors adding in support for it for next-generation hardware would be
sacrificing any speed or benefits to rasterization?
John Carmack:
Right not at all. You could certainly do it (sparse voxel octree) but it's not
going to be competitive. The number of pixels that you could generate with that
would be less than a 10th of what you could do with a rasterization approach.
But the hope would be that in the coming generation we might have the technology
for it.
No matter who does what, the next generation is going to be really good as
rasterization, that is a foregone conclusion. Intel is spending lots of effort
to make sure Larrabee is a competitive rasterizer. And it's going to be ball
park competitive, we'll see how things work out, but a factor of 2 plus or minus
is most likely. But everything is going to be a good rasterizer. We should
have enough general purpose computational ability to also be able to do some of
these other novel architectures and while everybody thinks it's going to be
great I have to reiterate that nobody has actually shown exactly how it's going
to be great. I have my ideas and I'm sure other people have their ideas but
it's completely possible that the next generation of high end graphics is just
going to be rasterizing like we do today with a little more flexibility and 10x
the speed.
Ryan Shrout:
Do you think DirectX or OpenGL will have to be modified for this?
John Carmack: They are almost irrelevant in a general purpose computation environment. They are clearly rasterization based APIs with their heritage but there is a lot of heard room left in the programming we can do with this. Almost any problem that you ask for can be decomposed into these data parallel sorts of approaches and it's not like we're capping out what we can do with rasterization based graphics. But when you get these general purpose computing things going, they will look like different environments for how you would program things. It could look like CUDA or with Larrabee you could just program them as a bunch of different computers with SIMD units.
Ryan Shrout:
Intel has discussed the benefit of ray tracing's ability to scale with the
hardware when they showed off the Q4: Ray Traced engine on a UMPC recently.
What are your thoughts on that possible advantage?
John Carmack: Speaking as someone that is a mobile developer and a high end console developer, that's a ridiculous argument.
Ryan Shrout:
Rasterization can scale just as easily?
John Carmack: Yeah. The idea of moving ray tracing onto the mobile platforms makes no sense at all.
Ryan Shrout:
What are your thoughts on Intel's purchase of Havok and Project Offset? One
theory is that Intel is going to be making a game engine either for demos or to
sell. Do you think this is their hope in addressing the ability to "show a win"
as you mentioned before?
John Carmack: That's what they have to do, that's always been my argument to Intel and to a lesser degree the other companies. The best way to evangelize your technology is to show somebody something. To show an existence proof for it, to kind of eat your own dog food, in terms of working with everything. Instead of just telling everyone you should be able to do great things with this, the right thing to do is for them to produce something that is spectacular and then say "ok everybody that wants this here's the code". That's the best way to lead anybody; it's by example. They'll learn the pros and cons of everything directly there and I very much endorse that direction for them.
Multi-GPU graphics and Conclusions
Ryan Shrout:
What are your thoughts on the current climate of multi-GPU systems? Do you see
that as a real benefit and do you think developers are able to take advantage of
those kind of hardware configurations easily enough?
John Carmack: From a developer stand point the uncomfortable truth is that the console capabilities really dominate the development decisions today. If you look at current titles and how they've done on the console, you know, high end action GPU based things, the consoles are so the dominate factor that it's difficult to set things up so that you can do much to leverage the really extreme high end desktop settings. Traditionally you get more resolution, where a console game might be designed for 720p and the high end PC you go ahead and run at 1080p or even higher resolution, that's an obvious thing. You crank up the resolution. You turn off compression when you have 1GB of video memory available. And also normally you can go from a 30 Hz console game to a 60 Hz PC game. So there are a number of things you can crank up there on the PC, but it's difficult to try and justify any radically different algorithm, something you would really do with 4x the power you'd have with a high end PC system.
Ryan Shrout:
Do you think NVIDIA and AMD are relying too heavily on the multi-GPU technology
instead of pushing forward with true next-generation GPUs? Will multi-GPU
systems continue to be an option at all?
John Carmack:
I've always been a big proponent of these high end boutique systems - way back
from the early days of 3dfx I always thought it was a real feather in their cap
early on that they could pay more money and have a bigger system and have it
double up and just go faster. I think it's a really good option and certainly
companies like NVIDIA and AMD are throwing all the resources they possible can
at making the newer, next-generation cards. But to be able to have this ability
to just pay more money and get more performance out of a current generation is
really useful thing to have. Whether it makes sense for gaming to have these
thousand dollar graphics cards is quite debatable but it's really good for
developers; to be able to target something high end that's going to come out
three years from now by being able to pay more money today for 2x more power.
Certainly the whole high end simulation business has benefited a lot from
commoditization of scalable graphics.
Although on the down side it was clear that years back when everything was
going in a fairly simple algorithmic approach as far as graphics engines where
you just rendered to your frame buffer, it was easy for them to go ahead and
chunk that frame buffer up into an arbitrary number of pieces. But now there is
much more tight coupling between the graphics render and the GPUs where there
are all sorts of feedbacks, rendering to sub buffers, going back and forth,
getting dependent conditional query results, and it makes it a lot harder to
just chunk the problem up like that. But that's the whole tale of
multi-processing since the very beginning; we're fighting that with multiple
CPUs. It's the primary issue with advancing performance in computing.
That is my big take away message for a lot of people about the upcoming
generation of general purpose computation on GPUs; a lot of people don't seem to
really appreciate how the vertex fragment rasterization approach to computer
graphics has been unquestionably the most successful multi-processing solution
ever. If you look back over 40 years of research and what people have done on
trying to use multiple processors to solve problems, the fact that we can do so
much so easily with the vertex fragment model, it's a real testament to its
value. A lot of people just think "oh of course I want more flexibility I'd
love to have multiple CPUs doing all these different things" and there's a lot
of people that don't really appreciate what the suffering is going to be like as
we move through that; and that's certainly going on right now as software tries
to move things over, and it's not "oh just thread your application". Anyone
that says that is basically an idiot, not appreciating the problems. There are
depths of subtly to all of this where it's been an ivory tower research project
since the very beginning and it's by no means solved.
Ryan Shrout:
NVIDIA and AMD driver teams have to hack up games to get them to work optimally
on multi-GPU systems and that's more difficult for them today than in the past.
Do you think developers dependence on the console market, which is solely
single-GPU today, is a cause of those headaches?
John Carmack: It's probably making it even harder for the PC card guys because as developers get more sophisticated with the low level access we get on the consoles, the rendering engines are become harder to kind of, behind our backs, automatically split across multiple GPUs. We are doing more sophisticated things on the single GPU - there is a lot more data transfer going back and forth, updated states that have to be replicated across multiple GPUs, dependent sections of the screen doing different things. It's still possible buts it's kind of a hairy job and I definitely don't envy those driver writers or their task at all.
Ryan Shrout:
Any thoughts on the 3-4 GPU systems from AMD and NVIDIA? Overkill?
John Carmack:
For many applications, for the class of apps that just treat something like a
dumb frame buffer, they really will go ahead and be 4x faster especially if
you're just trying to be 4x the resolution on there, that's easy. There is no
doubt that if you take a game that's playing at the frame rate you want at a
certain resolution, a 4 GPU solution will usually be able to go ahead and render
4x the pixels, or very close to linear scaling.
But as far as what it's unlikely to do is take a game that's running 20 FPS at
a given nominal resolution and then make that game run 60 FPS. You're likely
bound up for things that aren't raw GPU throughput, usually CPU throughput in
the game.
Ryan Shrout:
You've had choice words for what AGEIA was trying to do with the hardware
physics add-in cards. Now that they are off the scene, having been purchased by
NVIDIA, what are your thoughts on that past situation?
John Carmack: That was one of those things where it was a stupid plan from the start and I really hope NVIDIA didn't pay too much because I found the whole thing disingenuous. Many people from the very beginning said their entire business strategy was to be acquired because it should have been obvious to everybody that the market for an add-in physics card was just not there. And the market proved not to be there. The whole thing about setting up a company and essentially lying to consumers, that this is a good idea, in order to cash out and be bought out by a big company, I saw the whole thing as pretty distasteful. It's obvious, and we knew when AGEIA was starting, that a few generations down the road we would have these general purpose compute resources on the GPU. And what we have right now are things like CUDA that you can implement physics on; you can't mix and match it very well right now, with such a heavy weight systems change, but that's going to be getting better in future revisions. And eventually you will be using a common set of resources that can run general data parallel stuff versus very high efficiency rasterization work. As for the PhysX hardware, while they would have a little bit of talk about how their architecture was somehow much better suited for physics processing, and it might have been somewhat better suited, for it they never told anyone how or why.
Ryan Shrout:
Do you think moving physics to a GPU is a benefit?
John Carmack: Right now, to offload tasks like that you have to be able to go ahead and stick them in a pretty deep pipeline so it doesn't fit the way people do physics modeling in their games very well right now. But as people choose to either change their architecture to allow a frame of latency in the reports of collision detection in physics or we get much finer grain parallelization where you don't have this really long latency and you can kind of force an immediate mode call to GPU operations, then we start using that just the way we do SSE instructions or something in our current code base. Then, yeah, we definitely will wind up using compute resources for things like that or collision detection physics.
Ryan Shrout:
NVIDIA has Novodex, Intel has Havok -- will that cause fragmentation in the
market? Do you think Microsoft would combine them into a physics API like they
did DirectX?
John Carmack:
It will be interesting to see how that plays out because while I was well known
for having certain issues with Microsoft on the graphics API side of things I
really think Microsoft did the industry a good favor by eventually getting to
the DX9 class of stuff, having a very intelligent standard that everyone was
forced to abide by. And it was a good thing. But of course I have worries as
we look towards this general compute topic, because if MS took 9 tries to get it
right...well. They probably have some accumulated wisdom about that whole
process now, but there is always a chance for MS to sort of overstep their
actual experience and lay down a standard that's no good. T heirstandards
almost always evolve into something good... it would be wonderful if they got it
right on the first step of DX compute, or whatever its going be. I wouldn't
hold my breath on that because really all of this it is still research. With
graphics we were really, for a larger part, following the SGI model for a long
time and that gave the industry a real leg up. Right now this comes back to the
earlier point: everybody's still waving their hands about what wonderful stuff
we're going to do here but we really don't have the examples let alone the
applications. So it's sort of a dangerous time to go in and start making
specific standards when there's not actually all that much of an experience
base.
As far as the physics APIs, I do expect that for any API to wind up getting
broad game developer support, whether it's going to Novodex or Havok, they are
going to have to have backends that at least function using any acceleration
technology available. It'll just be a matter of Intel obviously not trying to
make a CUDA implementation very fast but someone will wind up having a CUDA
implementation for it that is at least plug compatible. Maybe NVIDIA will end
up having wrappers for their APIs to do that. But that is just kind of the
reality with today's development; unless you are a Microsoft tech developer or
something that's tied to the Xbox 360 platform, developers aren't going to make
a choice where "well we're going to use Intel's stuff and not run on the current
console generation" or something. That's just not going to happen.