At the recent Qualcomm Snapdragon Tech Summit, the company announced its new flagship smartphone processor, the Snapdragon 8 Gen 1. Replacing the Snapdragon 888, this new chip is set to be in a number of high performance flagship smartphones in 2022. The new chip is Qualcomm’s first to use Arm v9 CPU cores as well as Samsung’s 4nm process node technology. In advance of devices coming in Q1, we attended a benchmarking session using Qualcomm’s reference design, and had a couple of hours to run tests focused on the new performance core, based on Arm’s Cortex-X2 core IP.

The Snapdragon 8 Gen 1

Rather than continue with the 800 naming scheme, Qualcomm is renaming its smartphone processor portfolio to make it easier to understand / market to consumers. The Snapdragon 8 Gen 1 (hereafter referred to as S8g1 or 8g1) will be the headliner for the portfolio, and we expect Qualcomm to announce other processors in the family as we move into 2022. The S8g1 uses the latest range of Arm core IP, along with updated Adreno, Hexagon, and connectivity IP including an integrated X65 modem capable of both mmWave and Sub 6 GHz for a worldwide solution in a single chip.

While Qualcomm hasn’t given any additional insight into the Adreno / graphics part of the hardware, not even giving us a 3-digit identifier, we have been told that it is a new ground up design. Qualcomm has also told us that the new GPU family is designed to look very similar to previous Adreno GPU sfrom a feature/API standpoint, which means that for existing games and other apps, it should allow a smooth transition with better performance.  We had time to run a few traditional gaming tests in this piece.

On the DSP side, Qualcomm’s headlines are that the chip can process 3.2 Gigapixels/sec for the cameras with an 18-bit pipeline, suitable for a single 200MP camera, 64MP burst capture, or 8K HDR video. The encode/decode engines allow for 8K30 or 4K120 10-bit H.265 encode, as well as 720p960 infinite recording. There is no AV1 decode engine in this chip, with Qualcomm’s VPs stating that the timing for their IP block did not synchronize with this chip.

Qualcomm's Alex Katouzian

AI inference performance has also quadrupled - 2x from architecture updates and 2x from software. We have a couple of AI tests in this piece.

As usual with these benchmarking sessions, we’re very interested in what the CPU part of the chip can do. The new S8g1 from Qualcomm features a 1+3+4 configuration, similar to the Snapdragon S888, but using Arm’s newest v9 architecture cores.

  1. The single big core is a Cortex-X2, running at 3.0 GHz with 1 MiB of private L2 cache.
  2. The middle cores are Cortex-A710, running at 2.5 GHz with 512 KiB of private L2 cache.
  3. The four efficiency cores are Cortex-A510, running at 1.8 GHz and an unknown amount of L2 cache. These four cores are arranged in pairs, with L2 cache being private to a pair.
  4. On the top of these cores is an additional 6 MiB of shared L3 cache and 4 MiB of system level cache at the memory controller, which is a 64-bit LPDDR5-3200 interface for 51.2 GB/s theoretical peak bandwidth.

Compared to the Snapdragon S888, the X2 is clocked higher than the X1 by around 5% and has additional architectural improvements on top of that. Qualcomm is claiming +20% performance or +30% power efficiency for the new X2 core over X1, and on that last point it is beyond the +16% power efficiency quoted by Samsung moving from 5nm to 4nm, so there are additional efficiencies Qualcomm is implementing in silicon to get that number. Unfortunately Qualcomm would not go into detail what those are, nor provide details about how the voltage rails are separated, if this is the same as S888 or different – Arm has stated that the X2 core could offer reduced power than the X1, and if the X2 is on its own voltage rail that could provide support for Qualcomm’s claims.

The middle A710 cores are also Arm v9, with an 80 MHz bump over the previous generation likely provided by process node improvements. The smaller A510 efficiency cores are built as two complexes each of two cores, with a shared L2 cache in each complex. This layout is meant to provide better area efficiency, although Qualcomm did not explain how much L2 cache is in each complex – normally they do, but for whatever reason in this generation it wasn’t detailed. We didn’t probe the number in our testing here due to limited time, but no doubt when devices come to market we’ll find out.

On top of the cores is a 6 MiB L3 cache as part of the DSU, and a 4 MiB system cache with the memory controllers. Like last year, the cores do not have direct access to this 4 MiB cache. We’ve seen Qualcomm’s main high-end competitor for next year, MediaTek, showcase that L3+system cache will be 14 MiB, with cores having access to all, so it will be interesting to see how the two compare when we have the MTK chip to test.

Benchmarking Session: How It Works

For our benchmarking session, we were given a ‘Qualcomm Reference Device’ (QRD) – this is what Qualcomm builds to show a representation of how a flagship featuring the processor might look. It looks very similar to modern smartphones, with the goal to mirror something that might come to market in both software and hardware. The software part is important, as the partner devices are likely a couple of months from launch, and so we recognize that not everything is final here. These devices also tend to be thermally similar to a future retail example, and it’s pretty obvious if there was something odd in the thermals as we test.

These benchmark sessions usually involve 20-40 press, each with a device, for 2-4 hours as needed. Qualcomm preloads the device with a number of common benchmarking applications, as well as a data sheet of the results they should expect. Any member of the press that wants to sideload any new applications has to at least ask one of the reps or engineers in the room. In our traditional workflow, we sideload power monitoring tools and SPEC2017, along with our other microarchitecture tests. Qualcomm never has any issue with us using these.

As with previous QRD testing, there are two performance presets on the device – a baseline preset expected to showcase normal operation, and a high performance preset that opportunistically puts threads onto the X2 core even when power and thermals is quite high, giving the best score regardless. The debate in smartphone benchmarking of initial runs vs. sustained performance is a long one that we won’t go into here (most noticeably because 4 hours is too short to do any extensive sustained testing) however the performance mode is meant to enable a ‘first run’ score every time.

Testing the Cortex-X2: A New Android Flagship Core
Comments Locked


View All Comments

  • nucc1 - Thursday, December 16, 2021 - link

    I have a desktop and laptop, I don't need a phone that can do desktop duties.
  • michael2k - Tuesday, December 14, 2021 - link

    You do realize that's exactly what Apple does with it's CPUs right? Use them for desktop/laptop parts?
  • eastcoast_pete - Thursday, December 16, 2021 - link

    Actually, what Apple is doing is both annoying (for iPhone owners) and logical (from Apple's bottom line POV). This and the prior generation iPhone certainly have the hardware oomph to drive a desktop setup akin to Dex, but that would, of course, mean fewer ipad pro and iMac mini sales. The ability to run a desktop-type setup on an iPhone used to be minimal due to the lower RAM older generations used to have, but that has changed. Being able to run a desktop environment on a $ 1,500 iPhone would really add value.
  • Raqia - Tuesday, December 14, 2021 - link

    That said, the lagging performance of Apple's CPU+GPU in AI benchmarks proves most sites overstate the usefulness of CPUs in phones use cases when headlining with CPU specific performance metrics. Yes it's not an apples to oranges comparison, but it's proof that you should care about more than CPU benchmarks (particularly the consumer oriented Geekbench suite) even for Apple products when making comparisons between mobile phones.

    CPU performance for notebook form factors will matter a lot more, but on phones CPU bottlenecked use cases are typically web browser / apps using Javascript and app compilation, and even for most of those cases your bottleneck will be connectivity rather than local processing. Heavy lifting is much more often done by ISP and various DSPs that are harder to benchmark.

    As Andrei stated in his introduction to the S8G1:

    "Qualcomm gave examples such as concurrent processing optimizations that are meant to give large boosts in performance to real-world workloads that might not directly show up in benchmarks."

    This seems to be borne out by a reviewer of an anonymous device here:

    despite some seeming inefficiencies for the other IP blocks when individually pinned by a benchmark. It also seems like SPEC17 is showing better efficiency whilst Geekbench is showing worse which indicates that Geekbench may need to optimize better for this year's ARMv9 implementations. Still a modest improvement for CPU this year though when all's considered.
  • name99 - Tuesday, December 14, 2021 - link

    "That said, the lagging performance of Apple's CPU+GPU in AI benchmarks proves most sites overstate the usefulness of CPUs in phones use cases when headlining with CPU specific performance metrics. "

    Uh, no!
    It proves that a dedicated NPU does better than a CPU for these tasks.
    The point is that the Android tests go through Android APIs; the Apple tests are probably raw C that goes on the CPU (perhaps the GPU, but that's unlikely in the absence of using special APIs).
    Your complaint is as silly as comparing 3D SW running on a GPU vs emulated 3D running on the CPU.

    But if you prefer to compare browser benchmarks, go right ahead:

    A much better complaint is that I'm guessing all these tests were not compiled with SVE2 -- which could have substantial effects.
    But of course that requires the dev tools and OS to catch up, which means we have to wait for the official release.
  • Raqia - Tuesday, December 14, 2021 - link

    And other companies have decided to dedicate more of their die area to NPUs and other processing blocks than the CPU. This is usually neglected in cursory reviews of the SoC and the pinned-to-the-CPU benches are overemphasized by some reviewers with PC gaming hardware review pedigrees fixated on what's easy to benchmark and what they know rather than what's impactful to actual phone use cases.

    All that said the CPU on the iPhones is a gap step ahead of the competition for now, but Apple has consciously used more die area for this and emphasize this in their marketing. Note that Apple could market just the performance of phones and devices themselves but they unusually (for a consumer electronics oriented company) market the SoC separately in a slide in presentations and product specs. They de-emphasize the modem they use however in favor of stating what seem like phone level performance metrics. This is notable given that this is the same company with the marketing chops to morphed "Made in China" into "Designed in Cupertino. Assembled in China." (Glances at own iPhone. Hmm really...)
  • ChrisGX - Thursday, December 16, 2021 - link

    >> Qualcomm gave examples such as concurrent processing optimizations that are meant to give large boosts in performance to real-world workloads that might not directly show up in benchmarks.

    This seems to be borne out by a reviewer of an anonymous device here: <<

    That video review can't be used to make a case for hidden virtues of the Snapdragon 8 Gen 1 that for some reason have failed to show up in benchmark results. The reviewer castigated Qualcomm for producing a poor chip with many serious shortcomings. His comments about the X2 core suggest that he saw it as little better than a joke - a power hog that barely improves on earlier generation performance cores.

    The reviewer did acknowledge that the new GPU was fast but he underscored that the performance gain came at a high cost in terms of power consumption. On those occasions that the SD8 Gen 1 showed any substantial performance advantage over the Apple A15 in tests conducted by the reviewer- some games - the advantage had disappeared after 10 minutes as progressive throttling took its toll.

    The reviewer did look at modem performance (I'm not sure whether I understood the full context of the test) and once again the conclusion is the modem is fast and power hungry.

    I don't think the reviewer conducted any AI tests, which I suspect would have been the place that the SD8 Gen 1 excelled.
  • Meteor2 - Friday, December 17, 2021 - link

    What's mind-boggling is the performance using the ARM isa that Apple has achieved. Taken an equally mind-boggling amount of money to do it, more than anyone else can afford.
  • defaultluser - Tuesday, December 14, 2021 - link

    Decent little preview, but if I had to pick a "best core test," given the two hour limit, I would have chosen the little cores!
  • Ian Cutress - Tuesday, December 14, 2021 - link

    I tried running SPEC on the little cores. After 30 mins we were less than 10% complete.

Log in

Don't have an account? Sign up now