After several requests and a week’s break from our initial DirectX 12 article, we’re back again with an investigation into Star Swarm DirectX 12 performance scaling on AMD APUs. As our initial article was run on various Intel CPU configurations, this time we’re going to take a look at how performance scales on AMD’s Kaveri APUs, including whether DX12 is much help for the iGPU, and if it can help equalize the single-threaded performance gap been Kaveri and Intel’s Core i3 family.

To keep things simple, this time we’re running everything on either the iGPU or a GeForce GTX 770. Last week we saw how quickly the GPU becomes the bottleneck under Star Swarm when using the DirectX 12 rendering path, and how difficult it is to shift that back to the CPU. And as a reminder, this is an early driver on an early OS running an early DirectX 12 application, so everything here is subject to change.

CPU: AMD A10-7800
AMD A8-7600
Intel i3-4330
Motherboard: GIGABYTE F2A88X-UP4 for AMD
ASUS Maximus VII Impact for Intel
Power Supply: Rosewill Silent Night 500W Platinum
Hard Disk: OCZ Vertex 3 256GB OS SSD
Memory: G.Skill 2x4GB DDR3-2133 9-11-10 for AMD
G.Skill 2x4GB DDR3-1866 9-10-9 at 1600 for Intel
Video Cards: MSI GTX 770 Lightning
AMD APU iGPU
Video Drivers: NVIDIA Release 349.56 Beta
AMD Catalyst 15.200 Beta
OS: Windows 10 Technical Preview 2 (Build 9926)

 

Star Swarm CPU Scaling - Extreme Quality - GeForce GTX 770

 

Star Swarm CPU Scaling - Mid Quality - GeForce GTX 770

Star Swarm CPU Scaling - Low Quality - GeForce GTX 770

To get right down to business then, are AMD’s APUs able to shift the performance bottleneck on to the GPU under DirectX 12? The short answer is yes. Highlighting just how bad the single-threaded performance disparity between Intel and AMD can be under DirectX 11, what is a clear 50%+ lead for the Core i3 with Extreme and Mid qualities becomes a dead heat as all 3 CPUs are able to keep the GPU fully fed. DirectX 12 provides just the kick that the AMD APU setups need to overcome DirectX 11’s CPU submission bottleneck and push it on to the GPU. Consequently at Extreme quality we see a 64% performance increase for the Core i3, but a 170%+ performance increase for the AMD APUs.

The one exception to this is Low quality mode, where the Core i3 retains its lead. Though initially unexpected, examining the batch count differences between Low and Mid qualities gives us a solid explanation as to what’s going on: low pushes relatively few batches. With Extreme quality pushing average batch counts of 90K and Mid pushing 55K, average batch counts under Low are only 20K. With this relatively low batch count the benefits of DirectX 12 are still present but diminished, leading to the CPU no longer choking on batch submission and the bottleneck shifting elsewhere (likely the simulation itself).

Star Swarm CPU Batch Submission Time - Extreme - GeForce GTX 770

Meanwhile batch submission times are consistent between all 3 CPUs, with everyone dropping down from 30ms+ to around 6ms. The fact that AMD no longer lags Intel in batch submission times at this point is very important for AMD, as it means they’re not struggling with individual thread performance nearly as much under DirectX 12 as they were DirectX 11.

Star Swarm GPU Scaling - Mid Quality

Star Swarm GPU Scaling - Low Quality

Finally, taking a look at how performance scales with our GPUs, the results are unsurprising but none the less positive for AMD. Aside from the GTX 770 – which has the most GPU headroom to spare in the first place – both AMD APUs still see significant performance gains from DirectX 12 despite running into a very quick GPU bottleneck. This simple API switch is still enough to get another 44% out of the A10-7800 and 25% out of the A8-7600. So although DirectX 12 is not going to bring the same kind of massive performance improvements to iGPUs that we’ve seen with dGPUs, in extreme cases such as this it still can be highly beneficial. And this still comes without some of the potential fringe benefits of the API, such as shifting the TDP balance from CPU to GPU in TDP-constrained mobile devices.

Looking at the overall picture, just as with our initial article it’s important not to read too much into these results right now. Star Swarm is first and foremost a best case scenario and demonstration for the batch submission benefits of DirectX 12. And though games will still benefit from DirectX 12, they are unlikely to benefit quite as greatly as they do here, thanks in part to the much greater share of non-rendering tasks a CPU would be burdened with in a real game (simulation, AI, audio, etc.).

But with that in mind, our results from bottlenecking AMD’s APUs point to a clear conclusion. Thanks to DirectX 12’s greatly improved threading capabilities, the new API can greatly close the gap between Intel and AMD CPUs. At least so long as you’re bottlenecking at batch submission.

Comments Locked

152 Comments

View All Comments

  • D. Lister - Saturday, February 14, 2015 - link

    "Any chance you can do just intel vs amd apu graphics....?"

    No, because it has already been done:

    http://www.anandtech.com/show/8291/amd-a10-7800-re...
  • silverblue - Sunday, February 15, 2015 - link

    ...but not in this test.
  • D. Lister - Saturday, February 14, 2015 - link

    It would be interesting to see how Intel's iGPUs fare with DX12 compared to DX11. I mean there has to be something substantial in it for Intel to be a part of DX12 development from the start, otherwise they just spent time and resources to benefit their competition.
  • NightAntilli - Sunday, February 15, 2015 - link

    Ok. We got some AMD APUs in the mix now :) I still want to know whether the scaling is limited to 4 threads or not. Please test it again using the data of the A10 7850 and compare it to an FX-6xxx and FX-8xxx.
  • silverblue - Monday, February 16, 2015 - link

    I wouldn't hold out much hope for a significant performance boost as regards Bulldozer-based CPUs. We should've seen it from Mantle before now, though admittedly AMD does stand to benefit a little more here.

    I'd be interested in seeing if the G3258 benefits less than its APU/Athlon competition the more we look at Mantle/DirectX 12.
  • NightAntilli - Monday, February 16, 2015 - link

    It's not really a hope, but simply whether it makes sense to have more than 4 threads. Mantle has shown to reduce CPU overhead, but I haven't seen it actually divide the CPU tasks across multiple threads as well as DX12. I might be wrong though, but it would still be good to see it. Theoretically they could've tested this with the Intel i7 CPUs, but their single cores are too strong. So before more than 4 threads can be utilized, the bottleneck was already removed. With the weak ones of the FX, we can see whether DX12 actually scales or not.
  • akamateau - Monday, March 6, 2017 - link

    Anand disabled Asynch Compute which natively supports multi-core proceesing and non-serial data streams to the GPU.

    They went to a horserace and broke legs!
  • abianand - Monday, February 16, 2015 - link

    Good article. Please allow me to offer my unsolicited critique anyway.

    Is there any reason the core-i3 is missing out in the last two charts?? Its exclusion in the last two charts makes the article incomplete in a sense. There is a reason I'm asking this. A better way to lay all the charts out in this article would have been to have the items in the same consistent order in all the charts. (e.g. i3, a8, a10)

    E.g. the first 3 charts have i3, a10 and a8 in that order.
    The 4th chart has i3, a8 and a10.
    The 5th and the 6th charts have the most damaging (meaning most misrepresentative) change in my opinion. These two charts have GTX 770, A10 and A8 in that order. And it looks exactly the same (black and red bars) as the 4th chart that has i3, a8, a10 in that order !!!

    Now I guess most readers would see the first few charts and assume that in the 5th and 6th charts too, that i3 is listed first and the two AMD apus are listed after that. That this, this will cause them to assume that the first item in the 5th and 6th charts is i3, whereas it is actually a gtx 770. They will think incorrectly that i3 takes a lead over the APUs, when actually it is gtx770 that is taking the lead.

    Maybe I'm a doofus, and this is what I did exactly at my first read and thought "whoa, core-i3 is equivalent to the apus in all other tests but so much faster in the last two charts?". Only later did I read the last two charts again and realised that it is gtx770 that is taking the naturally massive lead over the others.

    Sorry for the long text, but it is the only way I know to convey what I had in mind. Otherwise, a good article and looking forward to more consistent graphs in the future !
  • Ryan Smith - Monday, February 16, 2015 - link

    "Is there any reason the core-i3 is missing out in the last two charts??"

    Intel WDDM 2.0 drivers are not available at this time. So we can test for the CPU but not the GPU.
  • abianand - Tuesday, February 17, 2015 - link

    thanks, Ryan !

    the other things still stand, though ;-)

Log in

Don't have an account? Sign up now