ATI Radeon HD 2900 XT: Calling a Spade a Spade
by Derek Wilson on May 14, 2007 12:04 PM EST- Posted in
- GPUs
Next Up: NVIDIA's G80
NVIDIA has been more tight-lipped about their underlying architecture, but we will infer as much as possible from the block diagrams we've seen and conversations we've had.
The G80 shader core is a little different from the R600. It is built on eight SIMD units each containing 16 SPs. The SIMD instructions are not VLIW, but single scalar instructions, and each SP within a SIMD unit executes that instruction on a different thread. While groups of 16 SPs share resources, NVIDIA's compiler doesn't need to build VLIW instructions to schedule out any of these SPs and it would be quite difficult to create dependencies between SPs because they are running different threads.
The bottom line here is that up to eight distinct shader operations are running across 128 threads at one time. This means we could have 128 threads all complete a scalar operation every clock, or we could have 128 threads all complete a 4-wide vector operation one component at a time over four clocks.
On NVIDIA hardware, vertex threads are assigned to SIMD units in blocks of 16, while geometry and pixel threads are assigned in blocks of 32 (16 threads over two clocks). With smaller blocks, we see better branch performance but worse cache or prefetch utilization than we would with a more coarsely grained approach.
This implementation also means that we don't have to worry about dependencies in the shader code. Of course, it is also the case that we can't extract parallelism from the shader code itself. But the advantage gives us a steady rate of 128 operations per clock. This can actually go up in some special cases, but it shouldn't go lower under normal circumstances.
Comparing Shader Architectures: R600 vs. G80
The key to the architecture comparison is to realize that nothing is straight up apples to apples here. We need to look at how much work can be done per clock, how much work is likely to be done per clock, and how much work we can get done per unit time.
First, G80 can process more threads in parallel: 128 as opposed to R600's 64. Performing work on more threads at a time is one very good way of extracting overall parallelism from the problem of graphics. There are millions of pixels in every frame that need to be processed, and if we had hardware large enough we could process them all at once.
However, more work (up to 5x) is potentially getting done on each of those 64 threads than on NVIDIA's 128 threads. This is because R600 can execute up to five parallel operations per thread while NVIDIA hardware is only able to handle one operation at a time per SP (in most cases). But maximizing throughput on the AMD hardware will be much more difficult, and we won't always see peak performance from real code. On the best case level, R600 is able to do 2.5x the work of G80 per clock (320 operations on R600 and 128 on G80). Worst case for code dependency on both architectures gives the G80 a 2x advantage over R600 per clock (64 operations on R600 with 128 on G80).
The real difference is in where parallelism is extracted. Both architectures make use of the fact that threads are independent of each other by using multiple SIMD units. While NVIDIA focused on maximizing parallelism in this area of graphics, AMD decided to try to extract parallelism inside the instruction stream by using a VLIW approach. AMD's average case will be different depending on the code running, though so many operations are vector based, high utilization can generally be expected.
However, even if we expect high utilization on AMD hardware, the fact remains that G80 has a large clock speed advantage. With the shader core on G80 pushed up to 1.5 GHz, we could still see some cases where R600 is faster, but the majority of the time G80 should be able to best R600 on a pure compute basis.
This overview still isn't the bottom line in performance. Efficient latency hiding, good scheduling, high cache utilization, high availability of texture data, good branching, and fast and efficient Z/stencil and color processing all contribute as well. Where possible, let's explore those areas a bit more.
86 Comments
View All Comments
johnsonx - Monday, May 14, 2007 - link
and to which are you going to admit to?What was that old saying about glass houses and throwing stones? Shouldn't throw them in one? Definitely shouldn't them if you ARE one!
Puddleglum - Monday, May 14, 2007 - link
You mean, while it does compete performance-wise?johnsonx - Monday, May 14, 2007 - link
No, I'm pretty sure they mean DOESN'T. That is, the card can't compete with a GTX, yet still uses more power.INTC - Monday, May 14, 2007 - link
Chadder007 - Monday, May 14, 2007 - link
When will we have the 2600's out in review?? Thats the card im waiting for.TA152H - Monday, May 14, 2007 - link
Derek,I like the fact you weren't mincing your words, except for a little on the last page, but I'll give you a perspective of why it might be a little better than some people will think.
There are some of us, and I am one, that will never buy NVIDIA. I bought one, had nothing but trouble with it, and have been buying ATI for 20 years. ATI has been around for so long, there is brand loyalty, and as long as they come out with something that is competent, we'll consider it against their other products without respect to NVIDIA. I'd rather give up the performance to work with something I'm a lot more comfortable with.
The power though is damning, I agree with you 100% on this. Any idea if these beasts are being made by AMD now, or still whoever ATI contracted out? AMD is typically really poor in their first iteration of a product on a process technology, but tend to improve quite a bit in succeeding ones. I wonder how much they'll push this product initially. It might be they just get it out to have it out, and the next one will be what is really a worthwhile product. That only makes sense, of course, if AMD is now manufacturing this product. I hope they are, they surely don't need to make anymore of their processors that aren't selling well.
One last thing I noticed is the 2400 Pro had no fan! It had a heatsink from Hell, but that will still make this a really attractive product for a growing market segment. Any chance of you guys doing a review on the best fanless cards?
DerekWilson - Wednesday, May 16, 2007 - link
TSMC is manufacturing the R600 GPUs, not AMD.AnnonymousCoward - Tuesday, May 15, 2007 - link
"I bought one, had nothing but trouble with it, and have been buying ATI for 20 years."That made me laugh. If one bad experience was all it took to stop you from using a computer component, you'd be left with a PS/2 keyboard at best.
"...to work with something I'm a lot more comfortable with."
Are you more comfortable having 4:3 resolutions stretched on a widescreen? Maybe you're also more comfortable with having crappier performance than nvidia has offered for the last 6 months and counting? This kind of brand loyalty is silly.
MadBoris - Monday, May 14, 2007 - link
As far as your brand loyalty, ATI doesn't exist anymore. Furthermore AMD executives will got the staff so you can't call it the same.Secondly, Nvidia has been a stellar company providing stellar products. Everyone has some ups and downs. Unfortunately with the hardware and drivers this is ATI's (er AMD's) downs.
This card should do ok in comparison to the GTS, especially as drivers mature. Some reviews show it doing better than GTS640 in most tests, so I am not sure where or how discrepencies are coming about. Maybe hardware compatibility, maybe settings.
rADo2 - Monday, May 14, 2007 - link
Many NVIDIA 8600GT/GTS cards do not have a fan, are available on the market now, and are (probably; different league) much more powerful than 2400 ;) But as you are a fanboy, you are not interested, right?