CPU Performance & Efficiency: SPEC2006

We’re moving on to SPEC2006, analysing the new single-threaded performance of the new Cortex-A77 cores. As the new CPU is running at the same clock as the A76-derived design of the Snapdragon 855, any improvements we’ll be seeing today are likely due to the IPC improvements of the core, the doubled L3 cache, as well as the enhancements to the memory controllers and memory subsystem of the chip.

Disclaimer About Power Figures Today:

The power figures presented today were captured using the same methodology we generally use on commercial devices, however this year we’ve noted a large discrepancy between figures reported by the QRD865’s fuel-gauge and the actual power consumption of the device. Generally, we’ve noted that there’s a discrepancy factor of roughly 3x. We’ve reached out to Qualcomm and they confirmed in a very quick testing that there’s a discrepancy of >2.5x. Furthermore, the QRD865 phones this year again suffered from excessive idle power figures of >1.3W.

I’ve attempted to compensate the data as best I could, however the figures published today are merely preliminary and of lower confidence than usual. For what it’s worth, last year, the QRD855 data was within 5% of the commercial phones’ measurements. We’ll be naturally re-testing everything once we get our hands on final commercial devices.

In the SPECint2006 suite, we’re seeing some noticeable performance improvements across the board, with some benchmarks posting some larger than expected increases. The biggest improvements are seen in the memory intensive workloads. 429.mcf is DRAM latency bound and sees a massive improvement of up to 46% compared to the Snapdragon 855.

What’s interesting to see is that some execution bound benchmarks such as 456.hmmer seeing a 28% upgrade. The A77 has an added 4th ALU which represents a 33% throughput increase in simple integer operations, which I don’t doubt is a major reason for the improvements seen here.

The improvements aren’t across the board, with 400.perlbench in particular seeing even a slight degradation for some reason. 403.gcc also saw a smaller 12% increase – it’s likely these benchmarks are bound by other aspects of the microarchitecture.

The power consumption and energy efficiency, if the numbers are correct, roughly match our expectations of the microarchitecture. Power has gone up with performance, but because of the higher performance and smaller runtime of the workloads, energy usage has remained roughly flat. Actually in several tests it’s actually improved in terms of efficiency when compared to the Snapdragon 855, but we’ll have to wait on commercial devices in order to make some definitive conclusions here.

In the SPECfp2006 suite, we’re seeing also seeing some very varied improvements. The biggest change happened to 470.lbm which has a very big hot loop and is memory bandwidth hungry. I think the A77’s new MOP-cache here would help a lot in regards to the instruction throughput, and the improved memory subsystem makes the massive 65% performance jump possible.

Arm actually had advertised IPC improvements of ~25% and ~35% for the int and FP suite of SPEC2006. On the int side, we’re indeed hitting 25% on the Snapdragon 865, compared to the S855, however on the FP side we’re a bit short as the increase falls in at around 29%. The performance increases here strongly depend on the SoC and particular on the memory subsystem, compared to the Kirin 990’s A76 implementation the increases here are only 20% and 24%, but HiSilicon’s chip also has a stronger memory subsystem which allows it to gain quite more performance over the A76’s in the S855.

The overall results for SPEC2006 are very good for the Snapdragon 865. Performance is exactly where Qualcomm advertised it would land at, and we’re seeing a 25% increase in SPECint2006 and a 29% in SPECfp2006. On the integer side, the A77 still trails Apple’s Monsoon cores in the A11, but the new Arm design now has been able to trounce it in the FP suite. We’re still a bit far away from the microarchitectures catching up to Apple’s latest designs, but if Arm keeps up this 25-30% yearly improvement rate, we should be getting there in a few more iterations.

The power and energy efficiency figures, again, taken with a grain of salt, are also very much in line with expectations. Power has slightly increased with performance this generation, however due to the performance increase, energy efficiency has remained relatively flat, or has even seen a slight improvement.

Introduction & Specifications System Performance
POST A COMMENT

178 Comments

View All Comments

  • quadrivial - Monday, December 16, 2019 - link

    I think there could be some possibility of AMD striking that deal with some stipulations. They have the semi-custom experience to make it happen and they don't have much to lose in mobile. AMD already included a small arm chip on their processors. They already use AMD GPUs too. A multi-chip package with be great here.

    I've given some thought to the idea of 8 Zen cores, 8 core ARM complex, 24CU Navi, 32GB HBM2, and a semi-custom IO die to the it together. You could bin all of these out for lower-spec'd devices. The size of this complex would be much smaller than a normal dedicated GPU, CPU, and RAM while using a bit less power. Most lower end devices would probably only need 2 x86 cores and 8-11CU with 8GB of RAM.
    Reply
  • zanon - Wednesday, December 18, 2019 - link

    >"I wonder if it's in the cards for Apple to ever include both an Intel processor as well as a full fledged mobile chip in the future, working in the same way as integrated/discrete graphics - the system would primarily run on the A13x, with the Intel chip firing up for Intel-binary apps as needed."

    Doubt it, if only because x64 is already coming out of patent protection, and with each passing year newer feature revisions will have the same thing happen. By 2025 or 2026 or so, Apple (or anyone else) will just flat out be able to implement x86-64 all the way up to Core 2 at least however they like (be it hardware, software, or some combo with code morphing or the like). That would probably be enough to cover most BC, sure stuff wouldn't run as fast but it would run. And there'd be a lot of power efficiency to be gained as well.
    Reply
  • Midwayman - Monday, December 16, 2019 - link

    OSX on arm seems a given soon. That would allow them to really blur the line between their ipad pro and the lower end laptops. Even if they are still technically different OSes it would make getting real pro apps onto the ipad pro a ton easier. MS tried this of course but didn't have the clout or tablet market to really make it happen. Apple is in a position to force the issue and has switch architectures in the past. Reply
  • levizx - Tuesday, December 17, 2019 - link

    Nope, Apple still support AArch32, and Apple 64bit is only ahead of ARM by 1 year max, actual S810 silicon by Qualcomm was only 15 months later than A7, you can't possibly say Apple started earlier AND took 2-3 years LESS than ARM's partners to design silicon. That would mean Apple has to beat A57 by at least 3 year. Reality says otherwise. Reply
  • quadrivial - Tuesday, December 17, 2019 - link

    Apple dropped aarch32 starting with A11.

    ARM announced their 64-bit ISA on 27 October 2011. The A7 launched 19 September 2013 -- less than two years later. Anandtech's first review of a finished A53 and A57 product was 10 Feb 2015 -- almost 3.5 years later and their product was obviously rushed with new revision coming out after and A57 being entirely replaced and forgotten.

    Qualcomm and others were shocked because they only had 2 years to do their designs and they weren't anywhere near complete. A ground-up new design in 23 months with a band new ISA isn't possible under and circumstances.

    https://www.google.com/amp/s/appleinsider.com/arti...
    Reply
  • ksec - Monday, December 16, 2019 - link

    Apple SoC uses more Die Space for CPU Core, it is as simple as that, so they are not a fair comparison. For roughly the same die size, Qualcomm has to fit in the Modem, while Apple has the modem external. Reply
  • rpg1966 - Monday, December 16, 2019 - link

    I'm not sure I understand the "fair" bit? The other chip makers are free to design a larger-core variant if they so choose. And, the 865 has the modem external, just like the Apple chips. Also, generally speaking, the SoC + external modem approach should require more power, yet Apple seems to do very well on those benchmarks.

    Maybe it's more as per another reply, i.e. Apple just optimises everything, one example being throwing out a32.
    Reply
  • generalako - Monday, December 16, 2019 - link

    That's not an argument -- the modem costs money for both parties either way at the end of the day. Also, Cortex Cores are pretty great, with still bigger year-on-year improvements than Apple (which seems to have stagnated), so it is closing the gap, albeit slowly. The big complaint however is in things like Qualcomm's complacency in GPU, or in ARM doing shit-all to give us a new efficiency core architecture, after 3 years.

    Apple has surpassed them hugely here, to the point that their efficiency cores perform more than 2x as much with half the power. Now, if you want bring price into here, think about how much that costs OEMs. It costs them by forcing them to use mid-range SoCs that use expensive performance cores, when they could make due with only efficiency cores that performed better. It costs them, as well as flagship phones, in a lot of power efficiency, forcing them to do hardware compromises, or spend more on larger batteries, to compete.
    Reply
  • generalako - Monday, December 16, 2019 - link

    ARM has been catching up, though. The IPC increases since A11 have been pretty meagre, whereas A76 was a pretty sizeable jump (cutting a lot of the gap), and A77 is doing a 25% IPC jump, whereas the A13 did what, half that? Of course Apple still has a huge foothold, but the gap has been getting smaller...

    ARM's issue right now, though, is in efficiency cores. The fact that their Cambrdige team hasn't developed anything for 3 straight years now (going into the 4th), whereas Apple's yearly architecture improvement has given them efficiency cores that is monumentally better in both performance and efficiency, is getting embarrassing at this points. It's hurting Android phones a lot and getting kind of ridiculous at this point. No less frustrating that none of the SoC actors are bothering to make any dedicated architectures themselves to make up for it. Qualcomm is complacent in even their GPUs, which have been on the same architecture for 3 straight years and has in this time completely lost its crown to Apple--even ARM's Mali has caught up!
    Reply
  • FunBunny2 - Tuesday, December 17, 2019 - link

    "How is Apple so far ahead in some/many respects, given that Arm is dedicated to designing these microarchitectures?"

    based on what I've read in public reporting, Apple appears to mostly thrown hardware at the ISA. Apple has the full-boat ISA license, so they can take the abstract spec and lay it out on the silicon anyway they want. but what it appears is that all that godzilla transistor budget has gone to caches(s) and such, rather than a smarter ALU, fur instance. may haps AT has done an analysis just exactly what Apple did to the spec or RD to make their versions? did they actually 'innovate' the (micro-?) architecture, or did they, in fact, just bulk up the various parts with more transistors?
    Reply

Log in

Don't have an account? Sign up now