Earlier this month we had the pleasure to attend Qualcomm’s Maui launch event of the new Snapdragon 865 and 765 mobile platforms. The new chipsets promise to bring a lot of new upgrades in terms of performance and features, and undoubtedly will be the silicon upon which the vast majority of 2020 flagship devices will base their designs on. We’ve covered the new improvements and changes of the new chipset in our dedicated launch article, so be sure to read that piece if you’re not yet familiar with the Snapdragon 865.

As has seemingly become a tradition with Qualcomm, following the launch event we’ve been given the opportunity to have some hands-on time with the company’s reference devices, and had the chance to run the phones through our benchmark suite. The QRD865 is a reference phone made by Qualcomm and integrates the new flagship chip. The device offers insight into what we should be expecting from commercial devices in 2020, and today’s piece particularly focuses on the performance improvements of the new generation.

A quick recap of the Snapdragon 865 if you haven’t read the more thorough examination of the changes:

Qualcomm Snapdragon Flagship SoCs 2019-2020
SoC

Snapdragon 865

Snapdragon 855
CPU 1x Cortex A77
@ 2.84GHz 1x512KB pL2

3x Cortex A77
@ 2.42GHz 3x256KB pL2

4x Cortex A55
@ 1.80GHz 4x128KB pL2

4MB sL3 @ ?MHz
1x Kryo 485 Gold (A76 derivative)
@ 2.84GHz 1x512KB pL2

3x Kryo 485 Gold (A76 derivative)
@ 2.42GHz 3x256KB pL2

4x Kryo 485 Silver (A55 derivative)
@ 1.80GHz 4x128KB pL2

2MB sL3 @ 1612MHz
GPU Adreno 650 @ 587 MHz

+25% perf
+50% ALUs
+50% pixel/clock
+0% texels/clock
Adreno 640 @ 585 MHz




 
DSP / NPU Hexagon 698

15 TOPS AI
(Total CPU+GPU+HVX+Tensor)
Hexagon 690

7 TOPS AI
(Total CPU+GPU+HVX+Tensor)
Memory
Controller
4x 16-bit CH

@ 2133MHz LPDDR4X / 33.4GB/s
or
@ 2750MHz LPDDR5  /  44.0GB/s

3MB system level cache
4x 16-bit CH

@ 1866MHz LPDDR4X 29.9GB/s



3MB system level cache
ISP/Camera Dual 14-bit Spectra 480 ISP

1x 200MP

64MP ZSL or 2x 25MP ZSL

4K video & 64MP burst capture
Dual 14-bit Spectra 380 ISP

1x 192MP

1x 48MP ZSL or 2x 22MP ZSL

 
Encode/
Decode
8K30 / 4K120 10-bit H.265

Dolby Vision, HDR10+, HDR10, HLG

720p960 infinite recording
4K60 10-bit H.265

HDR10, HDR10+, HLG

720p480
Integrated Modem none
(Paired with external X55 only)


(LTE Category 24/22)
DL = 2500 Mbps
7x20MHz CA, 1024-QAM
UL = 316 Mbps
3x20MHz CA, 256-QAM

(5G NR Sub-6 + mmWave)
DL = 7000 Mbps
UL = 3000 Mbps
Snapdragon X24 LTE
(Category 20)

DL = 2000Mbps
7x20MHz CA, 256-QAM, 4x4

UL = 316Mbps
3x20MHz CA, 256-QAM
Mfc. Process TSMC
7nm (N7P)
TSMC
7nm (N7)

The Snapdragon 865 is a successor to the Snapdragon 855 last year, and thus represents Qualcomm’s latest flagship chipset offering the newest IP and technologies. On the CPU side, Qualcomm has integrated Arm’s newest Cortex-A77 CPU cores, replacing the A76-based IP from last year. This year Qualcomm has decided against requesting any microarchitectural changes to the IP, so unlike the semi-custom Kryo 485 / A76-based CPUs which had some differing aspects to the design, the new A77 in the Snapdragon 865 represents the default IP configuration that Arm offers.

Clock frequencies and core cache configurations haven’t changed this year – there’s still a single “Prime” A77 CPU core with 512KB cache running at a higher 2.84GHz and three “Performance” or “Gold” cores with reduced 256KB caches at a lower 2.42GHz. The four little cores remain A55s, and also the same cache configuration as well as the 1.8GHz clock. The L3 cache of the CPU cluster has been doubled from 2 to 4MB. In general, Qualcomm’s advertised 25% performance uplift on the CPU side solely comes from the IPC increases of the new A77 cores.

The GPU this year features an updates Adreno 650 design which increases ALU and pixel rendering units by 50%. The end-result in terms of performance is a promised 25% upgrade – it’s likely that the company is running the new block at a lower frequency than what we’ve seen on the Snapdragon 855, although we won’t be able to confirm this until we have access to commercial devices early next year.

A big performance upgrade on the new chip is the quadrupling of the processing power of the new Tensor cores in the Hexagon 698. Qualcomm advertises 15 TOPS throughput for all computing blocks on the SoC and we estimate that the new Tensor cores roughly represent 10 TOPS out of that figure.

In general, the Snapdragon 865 promises to be a very versatile chip and comes with a lot of new improvements – particularly 5G connectivity and new camera capabilities are promised to be the key features of the new SoC. Today’s focus lies solely on the performance of the chip, so let’s move on to our first test results and analysis.

New Memory Controllers & LPDDR5: A Big Improvement

One of the larger changes in the SoC this generation was the integration of a new hybrid LPDDR5 and LPDDR4X memory controller. On the QRD865 device we’ve tested the chip was naturally equipped with the new LP5 standard. Qualcomm was actually downplaying the importance of LP5 itself: the new standard does bring higher memory speeds providing better bandwidth, however latency should be the same, and power efficiency benefits, while there, shouldn’t be overplayed. Nevertheless, Qualcomm did claim they focused more on improving their memory controllers, and this year we’re finally seeing the new chip address some of the weaknesses exhibited by the past two generations; memory latency.

We had criticised Qualcomm’s Snapdragon 845 and 855 for having quite bad memory latency – ever since the company had introduced their system level cache architecture to the designs, this aspect of the memory subsystem had seen some rather mediocre characteristics. There’s been a lot of arguments in regards to how much this actually affected performance, with Qualcomm themselves naturally downplaying the differences. Arm generally notes a 1% performance difference for each 5ns of latency to DRAM, if the differences are big, it can sum up to a noticeable difference.


 (   )

Looking at the new Snapdragon 865, the first thing that pops up when comparing the two latency charts is the doubled L3 cache of the new chip. It’s to be noted that it does look that there’s still some sort of logical partitioning going on and 512KB of the cache may be dedicated to the little cores, as random-access latencies start going up at 1.5MB for the S855 and 3.5MB for the S865.

Further down in the deeper memory regions, we’re seeing some very big changes in latency. Qualcomm has been able to shave off around 35ns in the full random-access test, and we’re estimating that the structural latency of the chip now falls in at ~109ns – a 20ns improvements over its predecessor. While it’s a very good improvements in itself, it’s still a slightly behind the designs of HiSilicon, Apple and Samsung. So, while Qualcomm still is the last of the bunch in regards to its memory subsystem, it’s no longer trailing behind by such a large margin. Keep in mind the results of the Kirin 990 here as we go into more detailed analysis of memory-intensive workloads in SPEC on the next page.

Furthermore, what’s very interesting about Qualcomm’s results in the DRAM region is the behaviour of the TLB+CLR Trash test. This test is always hitting the same cache-line within a page across different, forcing a cache line replacement. The oddity here is that the Snapdragon 865 here behaves very differently to the 855, with the results showcasing a separate “step” in the results between 4MB and ~32MB. This result is more of an artefact of the test only hitting a single cache line per page rather than the chip actually having some sort of 32MB hidden cache. My theory is that Qualcomm has done some sort of optimisation to the cache-line replacement policy at the memory controller level, and instead the test hitting DRAM, it’s actually residing at on the SLC cache. It’s a very interesting result and so far, it’s the first and only chipset to exhibit such behaviour. If it’s indeed the SLC, the latency would fall in at around 25-35ns, with the non-uniform latency likely being a result of the four cache slices dedicated to the four memory controllers.

Overall, it looks like Qualcomm has made rather big changes to the memory subsystem this year, and we’re looking forward to see the impact on performance.

CPU Performance & Efficiency: SPEC2006
Comments Locked

178 Comments

View All Comments

  • quadrivial - Monday, December 16, 2019 - link

    I think there could be some possibility of AMD striking that deal with some stipulations. They have the semi-custom experience to make it happen and they don't have much to lose in mobile. AMD already included a small arm chip on their processors. They already use AMD GPUs too. A multi-chip package with be great here.

    I've given some thought to the idea of 8 Zen cores, 8 core ARM complex, 24CU Navi, 32GB HBM2, and a semi-custom IO die to the it together. You could bin all of these out for lower-spec'd devices. The size of this complex would be much smaller than a normal dedicated GPU, CPU, and RAM while using a bit less power. Most lower end devices would probably only need 2 x86 cores and 8-11CU with 8GB of RAM.
  • zanon - Wednesday, December 18, 2019 - link

    >"I wonder if it's in the cards for Apple to ever include both an Intel processor as well as a full fledged mobile chip in the future, working in the same way as integrated/discrete graphics - the system would primarily run on the A13x, with the Intel chip firing up for Intel-binary apps as needed."

    Doubt it, if only because x64 is already coming out of patent protection, and with each passing year newer feature revisions will have the same thing happen. By 2025 or 2026 or so, Apple (or anyone else) will just flat out be able to implement x86-64 all the way up to Core 2 at least however they like (be it hardware, software, or some combo with code morphing or the like). That would probably be enough to cover most BC, sure stuff wouldn't run as fast but it would run. And there'd be a lot of power efficiency to be gained as well.
  • Midwayman - Monday, December 16, 2019 - link

    OSX on arm seems a given soon. That would allow them to really blur the line between their ipad pro and the lower end laptops. Even if they are still technically different OSes it would make getting real pro apps onto the ipad pro a ton easier. MS tried this of course but didn't have the clout or tablet market to really make it happen. Apple is in a position to force the issue and has switch architectures in the past.
  • levizx - Tuesday, December 17, 2019 - link

    Nope, Apple still support AArch32, and Apple 64bit is only ahead of ARM by 1 year max, actual S810 silicon by Qualcomm was only 15 months later than A7, you can't possibly say Apple started earlier AND took 2-3 years LESS than ARM's partners to design silicon. That would mean Apple has to beat A57 by at least 3 year. Reality says otherwise.
  • quadrivial - Tuesday, December 17, 2019 - link

    Apple dropped aarch32 starting with A11.

    ARM announced their 64-bit ISA on 27 October 2011. The A7 launched 19 September 2013 -- less than two years later. Anandtech's first review of a finished A53 and A57 product was 10 Feb 2015 -- almost 3.5 years later and their product was obviously rushed with new revision coming out after and A57 being entirely replaced and forgotten.

    Qualcomm and others were shocked because they only had 2 years to do their designs and they weren't anywhere near complete. A ground-up new design in 23 months with a band new ISA isn't possible under and circumstances.

    https://www.google.com/amp/s/appleinsider.com/arti...
  • ksec - Monday, December 16, 2019 - link

    Apple SoC uses more Die Space for CPU Core, it is as simple as that, so they are not a fair comparison. For roughly the same die size, Qualcomm has to fit in the Modem, while Apple has the modem external.
  • rpg1966 - Monday, December 16, 2019 - link

    I'm not sure I understand the "fair" bit? The other chip makers are free to design a larger-core variant if they so choose. And, the 865 has the modem external, just like the Apple chips. Also, generally speaking, the SoC + external modem approach should require more power, yet Apple seems to do very well on those benchmarks.

    Maybe it's more as per another reply, i.e. Apple just optimises everything, one example being throwing out a32.
  • generalako - Monday, December 16, 2019 - link

    That's not an argument -- the modem costs money for both parties either way at the end of the day. Also, Cortex Cores are pretty great, with still bigger year-on-year improvements than Apple (which seems to have stagnated), so it is closing the gap, albeit slowly. The big complaint however is in things like Qualcomm's complacency in GPU, or in ARM doing shit-all to give us a new efficiency core architecture, after 3 years.

    Apple has surpassed them hugely here, to the point that their efficiency cores perform more than 2x as much with half the power. Now, if you want bring price into here, think about how much that costs OEMs. It costs them by forcing them to use mid-range SoCs that use expensive performance cores, when they could make due with only efficiency cores that performed better. It costs them, as well as flagship phones, in a lot of power efficiency, forcing them to do hardware compromises, or spend more on larger batteries, to compete.
  • generalako - Monday, December 16, 2019 - link

    ARM has been catching up, though. The IPC increases since A11 have been pretty meagre, whereas A76 was a pretty sizeable jump (cutting a lot of the gap), and A77 is doing a 25% IPC jump, whereas the A13 did what, half that? Of course Apple still has a huge foothold, but the gap has been getting smaller...

    ARM's issue right now, though, is in efficiency cores. The fact that their Cambrdige team hasn't developed anything for 3 straight years now (going into the 4th), whereas Apple's yearly architecture improvement has given them efficiency cores that is monumentally better in both performance and efficiency, is getting embarrassing at this points. It's hurting Android phones a lot and getting kind of ridiculous at this point. No less frustrating that none of the SoC actors are bothering to make any dedicated architectures themselves to make up for it. Qualcomm is complacent in even their GPUs, which have been on the same architecture for 3 straight years and has in this time completely lost its crown to Apple--even ARM's Mali has caught up!
  • FunBunny2 - Tuesday, December 17, 2019 - link

    "How is Apple so far ahead in some/many respects, given that Arm is dedicated to designing these microarchitectures?"

    based on what I've read in public reporting, Apple appears to mostly thrown hardware at the ISA. Apple has the full-boat ISA license, so they can take the abstract spec and lay it out on the silicon anyway they want. but what it appears is that all that godzilla transistor budget has gone to caches(s) and such, rather than a smarter ALU, fur instance. may haps AT has done an analysis just exactly what Apple did to the spec or RD to make their versions? did they actually 'innovate' the (micro-?) architecture, or did they, in fact, just bulk up the various parts with more transistors?

Log in

Don't have an account? Sign up now