Fiji’s Architecture: The Grandest of GCN 1.2

We’ll start off our in-depth look at the R9 Fury X with a look at the Fiji GPU underneath.

Like the Hawaii GPU before it, from a release standpoint Fiji is not really the pathfinder chip for its architecture, but rather it’s the largest version of it. Fiji itself is based on what we unofficially call Graphics Core Next 1.2 (aka GEN3), and ignoring HBM for the moment, Fiji incorporates a few smaller changes but otherwise remaining nearly identical to the previous GCN 1.2 chips. The pathfinder for GCN 1.2 in turn was Tonga, which was released back in September of 2014 as the Radeon R9 285.

So what does GCN 1.2 bring to the table over Hawaii and the other GCN 1.1 chips? Certainly the most well-known and marquee GCN 1.2 feature is AMD’s latest generation delta color compression technology. Tied in to Fiji’s ROPs, delta color compression augments AMD’s existing color compression capabilities with additional compression modes that are based around the patterns of pixels within a tile and the differences between them (i.e. the delta), increasing how frequently and by how much frame buffers (and RTs) can be compressed.

Frame buffer operations are among the most bandwidth intensive in a GPU – it’s a lot of pixels that need to be resolved and written to a buffer – so reducing the amount of memory bandwidth these operations draw on can significantly increase the effective memory bandwidth of a GPU. In AMD’s case, GCN 1.2’s delta color compression improvements are designed to deliver up to a 40% increase in memory bandwidth efficiency, with individual tiles being compressible at up to an 8:1 ratio. Overall, while the lossless nature of this compression means that the exact amount of compression taking place changes frame by frame, tile by tile, it is at the end of the day one of the most significant improvements to GCN 1.2. For Radeon R9 285 it allowed AMD to deliver similar memory performance on a 256-bit memory bus (33% smaller than R9 280’s), and for Fiji it goes hand-in-hand with HBM to give Fiji an immense amount of effective memory bandwidth to play with.

Moving on, AMD has also made some changes under the hood at the ALU/shader level for GCN 1.2. Many of these changes are primarily for AMD’s Carrizo APU, where task scheduling improvements go hand-in-hand with the AMD’s Heterogeneous System Architecture initiative and deliver improvements to allow the CPU and GPU to more easily deliver work to each other. Similarly, 16-bit instructions are intended to save on power consumption in mobile devices that use lower precision math for basic rendering.

More applicable to Fiji and its derivatives are the improvements to data-parallel processing. GCN 1.2 now has the ability for data to be shared between SIMD lanes in a limited fashion, beyond existing swizzling and other data organizations methods. This is one of those low-level tweaks I’m actually a bit surprised AMD even mentioned (though I’m glad they did) as it’s a little tweak that’s going to be very algorithm specific. For non-programmers there’s not much to see, but for programmers – particularly OpenCL programmers – this will enable newer, more efficient algorithms where when the nature of the work requires working with data in adjacent lanes.

But for gamers, perhaps the most significant architectural improvement to GCN 1.2 and thereby Fiji are the changes made to tessellation and geometry processing. There is no single silver bullet here – after going with a 4-wide geometry front-end in Hawaii, AMD hasn’t changed it for Tonga or Fiji – but AMD has put in quite a bit of effort in to improving how geometry data moves around within the chip and how it’s used, on the basis that at this point the limitations aren’t in raw geometry performance, but rather the difficulties in achieving that performance.

Much of this effort has been invested in better handling small geometry, whether it’s large quantities of small batches, or even small quantities of small batches. The inclusion of small instance caching, for example, allows the GPU to better keep small batches of draw calls in cache, allowing them to be referenced and/or reused in the future without having to go to off-cache memory. Similarly, AMD can now store certain cases of vertex inputs for the geometry shader in shared memory, which like small instance caching allows for processing to take place more frequently on-chip, improving performance and cutting down on DRAM traffic.

More specific to Fiji’s incarnation of GCN is how distribution is handled. Load balancing and distribution among the geometry frontends is improved overall, including some low-level optimizations to how primitives generated from tessellation are distributed. Generally speaking distribution is a means to improve performance by removing bottlenecks, however AMD is now catching a specific edge case where small amplification factors don’t generate a lot of primitives, and in those cases they’re now skipping distribution since the gains are minimal, and more likely than not the cost from the bus traffic is greater than the benefits of distribution.

Finally, AMD has also expanded the vertex reuse window on GCN 1.2. As in the general case of reuse windows, the vertex reuse window is a cache of sorts for vertex data, allowing old results to be held in waiting in case they are needed again (as is often the cases in graphics). Though they aren’t telling us just how large the window now is, GCN 1.2 now features a larger window, which increases the hit rate for vertex data and as a result further edges geometry performance up since that data no longer needs to be regenerated.

As with our R9 285 review, I took the time to quickly run TessMark across the x8/x16/x32/x64 tessellation factors just to see how tessellation and geometry performance scales on AMD’s cards as the tessellation factor increases. Keeping in mind that all of the parts here have a 4-wide geometry front-end, the R9 285, R9 290X, and R9 Fury X all have the same geometry throughput on paper, give or take 10% for clockspeeds. What we find is that Fury X shows significant performance improvements at all levels, beating not only the Hawaii based R9 290X, but even the Tonga based R9 285. Tessellation performance is consistently 33% ahead of the R9 290X, while against Tonga it’s anywhere between a 33% lead at high factors to a 130% lead at low tessellation factors, showing the influence of AMD’s changes to how tessellation is handled with low factors.

The AMD Radeon R9 Fury X Review The Fiji GPU: Go Big or Go Home
Comments Locked

458 Comments

View All Comments

  • D. Lister - Thursday, July 2, 2015 - link

    "AMD had tessellation years before nVidia, but it went unused until DX11, by which time nVidia knew AMD's capabilities and intentionally designed a way to stay ahead in tessellation. AMD's own technology being used against it only because it released it so early. HBM, I fear, will be another example of this. AMD helped to develop HBM and interposer technologies and used them first, but I bet nVidia will benefit most from them."

    AMD is often first at announcing features. Nvidia is often first at implementing them properly. It is clever marketing vs clever engineering. At the end of the day, one gets more customers than the other.
  • sabrewings - Thursday, July 2, 2015 - link

    While you're right that Nvidia paid for the chips used in 980 Tis, they're still most likely not fit for Titan X use and are cut to remove the underperforming sections. Without really knowing what their GM200 yields are like, I'd be willing to be the $1000 price of the Titan X was already paying for the 980 Ti chips. So, Nvidia gets to play with binned chips to sell at $650 while AMD has to rely on fully up chips added to an expensive interposer with more expensive memory and a more expensive cooling solution to meet the same price point for performance. Nvidia definitely forced AMD into a corner here, so as I said I would say they won.

    Though, I don't necessarily say that AMD lost, they just make it look much harder to do what Nvidia was already doing and making bookoo cash at that. This only makes AMD's problems worse as they won't get the volume to gain marketshare and they're not hitting the margins needed to heavily reinvest in R&D for the next round.
  • Kutark - Friday, July 3, 2015 - link

    So basically what you're saying is Nvidia is a better run company with smarter people working there.
  • squngy - Friday, July 3, 2015 - link

    "and they cost more per chip to produce than AMD's Fiji GPU."

    Unless AMD has a genie making it for them that's impossible.
    Not only is fiji larger, it also uses a totally new technology (HBM).
  • JumpingJack - Saturday, July 4, 2015 - link

    "AMD had tessellation years before nVidia, but it went unused until DX11, by which time nVidia knew AMD's capabilities and intentionally designed a way to stay ahead in tessellation. AMD's own technology being used against it only because it released it so early. HBM, I fear, will be another example of this. AMD helped to develop HBM and interposer technologies and used them first, but I bet nVidia will benefit most from them."

    AMD fanboys make it sound like AMD can actually walk on water. AMD did work with Hynix, but the magic of HBM comes in the density from die stacking, which AMD did nothing (they are no longer the actual chipmaker as you probably know). As for interposers, this is not new technology, interposers are well established techniques for condensing an array of devices into one package.

    AMD deserves credit for bringing the technology to market, no doubt, but their actually IP contribution is quite small.
  • ianmills - Thursday, July 2, 2015 - link

    Good that you are feeling better Ryan and thanks for the review :)
    That being said Anandtech needs keep us better informed when things come up.... The way this site handled it though is gonna lose this site readers...
  • Kristian Vättö - Thursday, July 2, 2015 - link

    Ryan tweeted about the Fiji schedule several times and we were also open about it in the comments whenever someone asked, even though it wasn't relevant to the article in question. It's not like we were secretive about it and I think a full article of an article delay would be a little overkill.
  • sabrewings - Thursday, July 2, 2015 - link

    Those tweets are even featured on the site in the side bar. Not sure how much clearer it could get without an article about a delayed article.
  • testbug00 - Sunday, July 5, 2015 - link

    Pipeline story... Dunno title, but, for text, explain it there. Have a link to THG as owned by same company now if readers want to read a review immediately.

    Twitter is non-ideal.
  • funkforce - Monday, July 6, 2015 - link

    The problem isn't only with the delays, it is that since Ryan took over as Editor in Chief I suspect his workload is too large.
    Because this also happened with the Nvidia GTX 960 review. He told 5-6 people (including me) for 5 weeks that it would come, and then it didn’t and he stopped responding to inquires about it.
    Now in what way is that a good way to build a good relationship and trust between you and your readers?
    I love Ryan's writing, this article was one of the best I've read in a long time. But not everyone is good at everything, maybe Ryan needs to focus on only GPU reviews and not running the site or whatever his other responsibilities are as Edit. in Chief.

    Because the Reviews are what most ppl. come here for and what built this site. You guys are amazing, but AT never used to miss releasing articles the same day NDA was lifted in the past that I can remember. And promising things and then not delivering, sticking your head in the sand and not even apologizing isn’t a way to build up trust and uphold and strengthen the large following this site has.

    I love this site, been reading it since the 1st year it came out, and that's why I care and I want you to continue and prosper.
    Since a lot of ppl. can’t reed the twitter feed then what you did here: http://www.anandtech.com/show/8923/nvidia-launches...
    Is the way to go if something comes up, but then you have to deliver on your promises.

Log in

Don't have an account? Sign up now