John Carmack

John Carmack on id Tech 6, Ray Tracing, Consoles, Physics and more

Overview

Date: Mar 12, 2008
Original URL: http://www.pcper.com/article.php?aid=532
Synopsis: Transcription of Ryan Shrout's phone interview with John Carmack

In recent months a lot of discussion has been circulating about the roles of ray tracing, accelerated physics and multiple-GPU configurations in the future of PC and console gaming. Here at PC Perspective we have heard from Intel on the subject many times and just recently sat down with NVIDIA's Chief Scientist David Kirk to discuss the ray tracing and rasterization debate.

Many of our readers, as well as comments across the web, asked for feedback from the developers. It makes sense - these are the people that are going to be spending their money and time developing games to sell on next-generation architecture so surely their opinions would be more grounded in reality than a hardware company trying to push their technological advantages. With that in mind, we spent some time talking with John Carmack, the legendary programmer at id Software famous for Wolfenstein, Doom, Quake and the various engines that power them. What started out as a simple Q&A about Intel's ray tracing plans turned into a discussion on the future of gaming hardware, both PC and console, possible software approaches to future rendering technology, multiple-GPU and multi-core CPU systems and even a possible insight into id Tech 6, the engine that will replace the id Tech 5 / Rage title.

The information that John discussed with us is very in-depth and you'll probably want to block off some time to fully digest the data. You might also want to refresh your knowledge of octrees and voxels. Also note that in some areas the language of this text might seem less refined than you might expect simply because we are using a transcription of a recorded conversation.

Sections

Questions

Ray tracing for more than rendering

Ryan Shrout: Let's just jump right into the issue at hand. What is your take on current ray tracing arguments floating around such as those featured in a couple of different articles here at PC Perspective? Have you been doing any work on ray tracing yourself?

John Carmack: I have my own personal hobby horse in this race and have some fairly firm opinions on the way things are going right now. I think that ray tracing in the classical sense, of analytically intersecting rays with conventionally defined geometry, whether they be triangle meshes or higher order primitives, I'm not really bullish on that taking over for primary rendering tasks which is essentially what Intel is pushing. (Ed: information about Intel's research is here.) There are large advantages to rasterization from a performance standpoint and many of the things that they argue as far as using efficient culling technologies to be able to avoid referencing a lot of geometry, those are really bogus arguments because you could do similar things with occlusion queries and conditional renders with rasterization. Head to head rasterization is just a vastly more efficient use of whatever transistors you have available.

But, I do think that there is a very strong possibility as we move towards next generation technologies for a ray tracing architecture that uses a specific data structure; rather than just taking triangles like everybody uses and tracing rays against them and being really, really expensive. There is a specific format I have done some research on that I am starting to ramp back up on for some proof of concept work for next generation technologies. It involves ray tracing into a sparse voxel octree which is essentially a geometric evolution of the mega-texture technologies that we're doing today for uniquely texturing entire worlds. It's clear that what we want to do in the following generation is have unique geometry down to the equivalent of the texel across everything. There are different approaches that you could wind up and try to get that done that would involve tessellation and different levels of triangle meshes and you can could conceivably make something like that work but rasterization architecture does really start falling apart when your typical triangle size is less than one pixel. At that point you really have lost much of the benefits of rasterization. Not necessarily all of them, because linearly walking through a list of primitives can still be much faster than randomly accessing them for tracing, but the wins are diminishing there.

In our current game title we are looking at shipping on two DVDs, and we are generating hundreds of gigs of data in our development before we work on compressing it down. It's interesting that if you look at representing this data in this particular sparse voxel octree format it winds up even being a more efficient way to store the 2D data as well as the 3D geometry data, because you don't have packing and bordering issues. So we have incredibly high numbers; billions of triangles of data that you store in a very efficient manner. Now what is different about this versus a conventional ray tracing architecture is that it is a specialized data structure that you can ray trace into quite efficiently and that data structure brings you some significant benefits that you wouldn't get from a triangular structure. It would be 50 or 100 times more data if you stored it out in a triangular mesh, which you couldn't actually do in practice.

I've been pitching this idea to both NVIDIA and Intel and just everybody about directions as we look toward next generation technologies. But this is one of those aspects where changing the paradigm of rendering from rasterization based approach to a ray casting approach or any other approach is not out of the question but I do think that the direction that Intel is going about it as a conventional ray tracer is unlikely to win out. While you could start doing some real time things that look interesting its always going to be a matter of a quarter the efficiency or a 10th of the efficiency or something like that. Intel of course hopes that they can win by having 4x the raw processing power on their Larrabee versus a conventional GPU, and as we look towards future generations that's one aspect of how the battle may shape up. Intel has always had process advantage over the GPU vendors and if they are able to have an architecture that has 3-4x the clock rate of the traditional GPU architectures they may be able to soak the significant software architecture deficit by clubbing it with processing power.

From the developers stand point there are pros and cons to that. We could certainly do interesting things with either direction. But literally just last week I was doing a little bit of research work on these things. The direction that everybody is looking at for next generation, both console and eventual graphics card stuff, is a "sea of processors" model, typified by Larrabee or enhanced CUDA and things like that, and everybody is sort of waving their hands and talking about "oh we'll do wonderful things with all this" but there is very little in the way of real proof-of-concept work going on. There's no one showing the demo of like, here this is what games are going to look like on the next generation when we have 10x more processing power - nothing compelling has actually been demonstrated and everyone is busy making these multi-billion dollar decisions about what things are going to be like 5 years from now in the gaming world. I have a direction in mind with this but until everybody can actually make movies of what this is going to be like at subscale speeds, it's distressing to me that there is so much effort going on without anybody showing exactly what the prize is that all of this is going to give us.

Ryan Shrout: So, because Intel's current demonstrations are using technology from two previous generations rather than showing off one or two generations AHEAD of today, there is little exciting to be drawn from it?

John Carmack: I wouldn't say there's anything that Intel has shown, even if they network a whole room full of PCs and say "we'll be able to stick all of this on a graphics card for you in the coming generation," I don't think they've shown the win. I don't think they've shown something people will say "my god that's 10x cooler" or "that makes me want to buy a new console".

It is tough in a research environment to do that because so much of the content battle now is media rather than algorithms. They've certainly been hacking on the Quake code bases to at least give them something that is not an ivory tower toy, but they're working with something that is previous generation technology and trying to make it look like something that is going to a next-gen technology. You really can't stretch media over two generational gaps like that, so they're stuck. Which is why I'm hoping to be able to do my part and provide some proof of concept demo technology this year. We're working on our RAGE project and the id Tech 5 code base but I've been talking to all the relevant people about what we think might be going on and what our goals are for an id Tech 6 generation. Which may very well involve, I'm certainly hoping it involves, ray tracing in the "sparse voxel octree" because at least I think I can show a real win. I think I can show something that you don't see in current games today, or even in the current in-development worlds of unique surface detail. By following that out into the extra dimension of having complete geometric detail at that same density I think can provide something that justifies the technological sea change.

Ryan Shrout: How dramatic would a hardware change have to be to take advantage of the structures you are discussing here?

John Carmack: It's interesting in that the algorithms would be something that, it's almost unfortunate in the aspect that these algorithms would take great advantage of simpler bit-level operations in many cases and they would wind up being implemented on this 32-bit floating point operation-based hardware. Hardware designed specifically for sparse voxel ray casting would be much smaller and simpler and faster than a general purpose solution but nobody in their right mind would want to make a bet like that and want to build specific hardware for technology that no one has developed content for. The idea would be that you have to have a general purpose solution that can approach all sorts of things and is at least capable of doing the algorithms necessary for this type of ray tracing operation at a decent speed. I think it's pretty clear that that's going to be there in the next generation. In fact, years and years ago I did an implementation of this with complete software based stuff and it was interesting; it was not competitive with what you could do with hardware, but it's likely that I'll be able to put something together this year probably using CUDA. If I can make something that renders a small window at a modest frame rate and we can run around some geometrically intricate sparse voxel octree world and make a 320x240 window at 10 fps and realize that on next-generation hardware that's optimized more for doing this we can go ahead and get 1080p 60 Hz on there.

That would be the justification that would make everybody sleep a whole lot of better that there is going to be some win coming out this.

Ryan Shrout: Is AMD's tessellation engine that they put in the R600 chips anywhere close to what you are looking for?

John Carmack: No, tessellation has been one of those things up there with procedural content generation where it's been five generations that we've been having people tell us it's going to be the next big thing and it never does turn out to be the case. I can go into long expositions about why that type of data amplification is not nearly as good as general data compression that gives you the data that you really want. But I don't think that's the world beater; I mean certainly you can do interesting things with displacement maps on top of conventional geometry with the tessellation engine, but you have lots of seaming problems and the editing architecture for it isn't nearly as obvious. What we want is something that you can carve up the world as continuously as you want without any respect to underlying geometry.

Hybrid rendering, graphics APIs and mobile ray tracing

Ryan Shrout: Based on your new data structure method using ray tracing, could you couple this with current rasterization methods for hybrid rendering?

John Carmack: I saw the quote from Intel about making no sense for a hybrid approach, and I disagree with that. I think that if you had basically a routine that ray traces this area of the screen in the sparse voxel octree it's going to spit out fragments, it's going to wind up having a depth value on there that you could intermix with anything else. Even if you had a ray trace against a conventional architecture you would still want to have a fragment program there that would look almost exactly like current fragment programs that we've got right now. I couldn't imagine wanting to do something that didn't have a back end like that. I mean you might even have vertex processors - the stuff that Intel is doing right now, ray tracing into the geometry, it's very likely that you would in the end want to be able to run the triangles in there that you are ray tracing against through vertex and fragment processors and you're just getting the barycentric coordinate of your ray trace stab. You have to know what you hit but then you have to know what you want to do there. You would want in addition some ability to send dependent rays out from there as extra elements.

It's reasonably likely that if my little data structure direction pans out you'll probably still want to do characters as skinned and boned with traditional animation methods. While you could go ahead and work out a voxel method of characters using refraction skeletons around characters and you could do animation, you probably wouldn't want to because we can make characters that look pretty damn good with the existing stuff and if everything continues to get 10x faster without us doing anything you'll probably want to do characters conventionally. But if you can do the world and most of the static objects at this incredible level of detail that you would get with the sparse voxel octree approach that seems like a completely reasonable way to mix and match.

Now there are aspects that mix and matching would work poorly. It would be nice to be able to solve the shadowing problem really directly by ray tracing. You could do that in a completely ray traced world, you just send the shadow rays out and jitter them and do all the nice things that let you solve the aliasing problem nicely. But if you rasterized characters traditionally with hardware skinning, the voxel ray tracer wouldn't find any intersections with the characters, and they wouldn't cast any shadows. So there are down sides to that but what I want to get out of ray tracing here is not a lot of what would be considered the traditional benefits of ray tracing: perfect shadows - shadowing would be damn nice to be solve but we can live without that - things like refraction and multiple mirror bounces. Those just aren't that important and we have every evidence in the world about that because in the real world where people make production renderings, even if they have almost infinite resources for movie budgets, very little of it is ray traced. There are spectacular off line ray tracers but even when you have production companies that have rooms and rooms of servers they choose not to use ray tracing very often because in the vast majority of cases it doesn't matter. It doesn't matter for what they are trying to do and it's not worth the extra cost. And that's going to stay fairly similar throughout the next-generation gaming hardware models.

What I really want to get out of the ray tracing is this infinite geometry which is more driven by the data structure that you have to use ray tracing to access, rather than the fact that you're bouncing these multiple rays around. I could do something next generation with this and I hope that it pans out that way - we may not have dependent rays at all it and may just use ray tracing to solve the geometry problem. Then you can also solve the aliasing problem by stocastically jittering all the sample centers which is something that I've been pushing to have integrated into current rasterization approaches. Its obvious how you do it in a ray tracing approach; you jitter all the samples and you have some dependent, refinement approach going on there.

I think that we can have huge benefits completely ignoring the traditional ray tracing demos of "look at these shiny reflective curved surfaces that make three bounces around and you can look up at yourself". That's neat but that's an artifact shader, something that you look at one 10th of 1% of the time in a game. And you can do a pretty damn good job of hacking that up just with a bunch environment map effects. It won't be right, but it will look cool, and that's all that really matters when you're looking at something like that. We are not doing light transport simulation here, we are doing something that is supposed to look good.

Ryan Shrout: So current generation consoles and PC graphics cards aren't going to be capable of running this new type of sparse voxel octree based technology? And do you think vendors adding in support for it for next-generation hardware would be sacrificing any speed or benefits to rasterization?

John Carmack: Right not at all. You could certainly do it (sparse voxel octree) but it's not going to be competitive. The number of pixels that you could generate with that would be less than a 10th of what you could do with a rasterization approach. But the hope would be that in the coming generation we might have the technology for it.

No matter who does what, the next generation is going to be really good as rasterization, that is a foregone conclusion. Intel is spending lots of effort to make sure Larrabee is a competitive rasterizer. And it's going to be ball park competitive, we'll see how things work out, but a factor of 2 plus or minus is most likely. But everything is going to be a good rasterizer. We should have enough general purpose computational ability to also be able to do some of these other novel architectures and while everybody thinks it's going to be great I have to reiterate that nobody has actually shown exactly how it's going to be great. I have my ideas and I'm sure other people have their ideas but it's completely possible that the next generation of high end graphics is just going to be rasterizing like we do today with a little more flexibility and 10x the speed.

Ryan Shrout: Do you think DirectX or OpenGL will have to be modified for this?

John Carmack: They are almost irrelevant in a general purpose computation environment. They are clearly rasterization based APIs with their heritage but there is a lot of heard room left in the programming we can do with this. Almost any problem that you ask for can be decomposed into these data parallel sorts of approaches and it's not like we're capping out what we can do with rasterization based graphics. But when you get these general purpose computing things going, they will look like different environments for how you would program things. It could look like CUDA or with Larrabee you could just program them as a bunch of different computers with SIMD units.

Ryan Shrout: Intel has discussed the benefit of ray tracing's ability to scale with the hardware when they showed off the Q4: Ray Traced engine on a UMPC recently. What are your thoughts on that possible advantage?

John Carmack: Speaking as someone that is a mobile developer and a high end console developer, that's a ridiculous argument.

Ryan Shrout: Rasterization can scale just as easily?

John Carmack: Yeah. The idea of moving ray tracing onto the mobile platforms makes no sense at all.

Ryan Shrout: What are your thoughts on Intel's purchase of Havok and Project Offset? One theory is that Intel is going to be making a game engine either for demos or to sell. Do you think this is their hope in addressing the ability to "show a win" as you mentioned before?

John Carmack: That's what they have to do, that's always been my argument to Intel and to a lesser degree the other companies. The best way to evangelize your technology is to show somebody something. To show an existence proof for it, to kind of eat your own dog food, in terms of working with everything. Instead of just telling everyone you should be able to do great things with this, the right thing to do is for them to produce something that is spectacular and then say "ok everybody that wants this here's the code". That's the best way to lead anybody; it's by example. They'll learn the pros and cons of everything directly there and I very much endorse that direction for them.

Multi-GPU graphics and Conclusions

Ryan Shrout: What are your thoughts on the current climate of multi-GPU systems? Do you see that as a real benefit and do you think developers are able to take advantage of those kind of hardware configurations easily enough?

John Carmack: From a developer stand point the uncomfortable truth is that the console capabilities really dominate the development decisions today. If you look at current titles and how they've done on the console, you know, high end action GPU based things, the consoles are so the dominate factor that it's difficult to set things up so that you can do much to leverage the really extreme high end desktop settings. Traditionally you get more resolution, where a console game might be designed for 720p and the high end PC you go ahead and run at 1080p or even higher resolution, that's an obvious thing. You crank up the resolution. You turn off compression when you have 1GB of video memory available. And also normally you can go from a 30 Hz console game to a 60 Hz PC game. So there are a number of things you can crank up there on the PC, but it's difficult to try and justify any radically different algorithm, something you would really do with 4x the power you'd have with a high end PC system.

Ryan Shrout: Do you think NVIDIA and AMD are relying too heavily on the multi-GPU technology instead of pushing forward with true next-generation GPUs? Will multi-GPU systems continue to be an option at all?

John Carmack: I've always been a big proponent of these high end boutique systems - way back from the early days of 3dfx I always thought it was a real feather in their cap early on that they could pay more money and have a bigger system and have it double up and just go faster. I think it's a really good option and certainly companies like NVIDIA and AMD are throwing all the resources they possible can at making the newer, next-generation cards. But to be able to have this ability to just pay more money and get more performance out of a current generation is really useful thing to have. Whether it makes sense for gaming to have these thousand dollar graphics cards is quite debatable but it's really good for developers; to be able to target something high end that's going to come out three years from now by being able to pay more money today for 2x more power. Certainly the whole high end simulation business has benefited a lot from commoditization of scalable graphics.

Although on the down side it was clear that years back when everything was going in a fairly simple algorithmic approach as far as graphics engines where you just rendered to your frame buffer, it was easy for them to go ahead and chunk that frame buffer up into an arbitrary number of pieces. But now there is much more tight coupling between the graphics render and the GPUs where there are all sorts of feedbacks, rendering to sub buffers, going back and forth, getting dependent conditional query results, and it makes it a lot harder to just chunk the problem up like that. But that's the whole tale of multi-processing since the very beginning; we're fighting that with multiple CPUs. It's the primary issue with advancing performance in computing.

That is my big take away message for a lot of people about the upcoming generation of general purpose computation on GPUs; a lot of people don't seem to really appreciate how the vertex fragment rasterization approach to computer graphics has been unquestionably the most successful multi-processing solution ever. If you look back over 40 years of research and what people have done on trying to use multiple processors to solve problems, the fact that we can do so much so easily with the vertex fragment model, it's a real testament to its value. A lot of people just think "oh of course I want more flexibility I'd love to have multiple CPUs doing all these different things" and there's a lot of people that don't really appreciate what the suffering is going to be like as we move through that; and that's certainly going on right now as software tries to move things over, and it's not "oh just thread your application". Anyone that says that is basically an idiot, not appreciating the problems. There are depths of subtly to all of this where it's been an ivory tower research project since the very beginning and it's by no means solved.

Ryan Shrout: NVIDIA and AMD driver teams have to hack up games to get them to work optimally on multi-GPU systems and that's more difficult for them today than in the past. Do you think developers dependence on the console market, which is solely single-GPU today, is a cause of those headaches?

John Carmack: It's probably making it even harder for the PC card guys because as developers get more sophisticated with the low level access we get on the consoles, the rendering engines are become harder to kind of, behind our backs, automatically split across multiple GPUs. We are doing more sophisticated things on the single GPU - there is a lot more data transfer going back and forth, updated states that have to be replicated across multiple GPUs, dependent sections of the screen doing different things. It's still possible buts it's kind of a hairy job and I definitely don't envy those driver writers or their task at all.

Ryan Shrout: Any thoughts on the 3-4 GPU systems from AMD and NVIDIA? Overkill?

John Carmack: For many applications, for the class of apps that just treat something like a dumb frame buffer, they really will go ahead and be 4x faster especially if you're just trying to be 4x the resolution on there, that's easy. There is no doubt that if you take a game that's playing at the frame rate you want at a certain resolution, a 4 GPU solution will usually be able to go ahead and render 4x the pixels, or very close to linear scaling.

But as far as what it's unlikely to do is take a game that's running 20 FPS at a given nominal resolution and then make that game run 60 FPS. You're likely bound up for things that aren't raw GPU throughput, usually CPU throughput in the game.

Ryan Shrout: You've had choice words for what AGEIA was trying to do with the hardware physics add-in cards. Now that they are off the scene, having been purchased by NVIDIA, what are your thoughts on that past situation?

John Carmack: That was one of those things where it was a stupid plan from the start and I really hope NVIDIA didn't pay too much because I found the whole thing disingenuous. Many people from the very beginning said their entire business strategy was to be acquired because it should have been obvious to everybody that the market for an add-in physics card was just not there. And the market proved not to be there. The whole thing about setting up a company and essentially lying to consumers, that this is a good idea, in order to cash out and be bought out by a big company, I saw the whole thing as pretty distasteful. It's obvious, and we knew when AGEIA was starting, that a few generations down the road we would have these general purpose compute resources on the GPU. And what we have right now are things like CUDA that you can implement physics on; you can't mix and match it very well right now, with such a heavy weight systems change, but that's going to be getting better in future revisions. And eventually you will be using a common set of resources that can run general data parallel stuff versus very high efficiency rasterization work. As for the PhysX hardware, while they would have a little bit of talk about how their architecture was somehow much better suited for physics processing, and it might have been somewhat better suited, for it they never told anyone how or why.

Ryan Shrout: Do you think moving physics to a GPU is a benefit?

John Carmack: Right now, to offload tasks like that you have to be able to go ahead and stick them in a pretty deep pipeline so it doesn't fit the way people do physics modeling in their games very well right now. But as people choose to either change their architecture to allow a frame of latency in the reports of collision detection in physics or we get much finer grain parallelization where you don't have this really long latency and you can kind of force an immediate mode call to GPU operations, then we start using that just the way we do SSE instructions or something in our current code base. Then, yeah, we definitely will wind up using compute resources for things like that or collision detection physics.

Ryan Shrout: NVIDIA has Novodex, Intel has Havok -- will that cause fragmentation in the market? Do you think Microsoft would combine them into a physics API like they did DirectX?

John Carmack: It will be interesting to see how that plays out because while I was well known for having certain issues with Microsoft on the graphics API side of things I really think Microsoft did the industry a good favor by eventually getting to the DX9 class of stuff, having a very intelligent standard that everyone was forced to abide by. And it was a good thing. But of course I have worries as we look towards this general compute topic, because if MS took 9 tries to get it right...well. They probably have some accumulated wisdom about that whole process now, but there is always a chance for MS to sort of overstep their actual experience and lay down a standard that's no good. T heirstandards almost always evolve into something good... it would be wonderful if they got it right on the first step of DX compute, or whatever its going be. I wouldn't hold my breath on that because really all of this it is still research. With graphics we were really, for a larger part, following the SGI model for a long time and that gave the industry a real leg up. Right now this comes back to the earlier point: everybody's still waving their hands about what wonderful stuff we're going to do here but we really don't have the examples let alone the applications. So it's sort of a dangerous time to go in and start making specific standards when there's not actually all that much of an experience base.

As far as the physics APIs, I do expect that for any API to wind up getting broad game developer support, whether it's going to Novodex or Havok, they are going to have to have backends that at least function using any acceleration technology available. It'll just be a matter of Intel obviously not trying to make a CUDA implementation very fast but someone will wind up having a CUDA implementation for it that is at least plug compatible. Maybe NVIDIA will end up having wrappers for their APIs to do that. But that is just kind of the reality with today's development; unless you are a Microsoft tech developer or something that's tied to the Xbox 360 platform, developers aren't going to make a choice where "well we're going to use Intel's stuff and not run on the current console generation" or something. That's just not going to happen.