Closing Remarks: Pushing Forward on 3 nm For 2024

Having attended Arm's Client Technology Day, my initial impressions were that Arm has opted to refine and hone its IP for 2024 instead of completely redefining and making groundbreaking changes. Following on from last year's introduction of the Armv9.2 family of cores, Arm has made some notable changes within the architecture of the latest Cortex series for 2024, with a clear and intended switch to the more advanced 3 nm process node, both with Samsung and TSMC 3 nm as the basis of client-based CSS for the 2024 platform.

The Cortex-X925, Cortex-A725, and Cortex-A520 cores have been optimized for the 3 nm process, delivering significantly touted performance and power efficiency improvements. The Cortex-X925, with its enhanced 10-wide decode and dispatch width and higher clock speeds reaching up to 3.8 GHz, looks to set a new standard for single-threaded IPC performance. Arm's updated v9.2 platform looks ideal for high-performance applications, including AI workloads and high-end gaming, both in the mobile space and with Microsoft's Windows on Arm ecosystem.

In the grand scheme of things, and from Arm's in-house performance comparisons between the new CSS platform and last year's TCS2023 version, Arm claims gains of between 30 and 60% in performance, depending on the task and workload. If it is to be believed and taken as gospel, the performance improvements are incredible, with the likely transition to 3 nm being the primary improver of performance rather than the underlying architectural improvements.

The Cortex-A725 balances performance and efficiency, making it suitable for several mid-range devices. Thanks to architectural enhancements such as increased cache sizes and expanded reorder buffers, Arm claims the improvements achieve up to 35% performance efficiency over the previous generation. The refreshed Cortex-A520 focuses primarily on being optimized on the 3 nm node while looking to remain unmatched in power efficiency, achieving a 15% energy saving compared to its predecessor. This core is optimized for low-intensity workloads, making it ideal for power-sensitive applications like IoT devices and lower-cost smartphones.

AI capabilities have been a significant focus in Arm's latest offerings. The Cortex-X925 and Cortex-A725 cores primarily integrate dedicated AI accelerators, allowing access to optimized software libraries, such as KleidiAI and KleidiCV, ensuring efficient AI processing. These enhancements are crucial for applications ranging from neural language models and LLMs.

Arm also continues to support its latest Core Cluster with a usually adept and comprehensive ecosystem driven by the new CSS platform, coupled with the Arm Performance Studio and in tandem with the Kleidi AI and CV libraries. These provided tools give developers a robust foundation to fully leverage the new architecture's capabilities. This effectively reduces the overall time-to-market and fosters innovations across various industries, such as content creation and on-device AI inferencing. The CSS platform's integration with operating systems such as Android, Linux, and Windows (Windows on Arm) ensures a larger reach in adoption. It pushes a wider level of development, making software and applications available on more devices than in previous generations. 

In summary, Arm's move to all its latest CPU designs onto the 3 nm process technology and the refinements in the Cortex-X925 and Cortex-A725 cores demonstrate a strategic focus on optimizing existing architectures rather than making radical changes. These refinements include increased cache sizes per core, moving to a wider pipeline, and bolstering the DSU-120 Core Cluster for 2024, which certainly delivers substantial performance and power efficiency gains on paper.

While enabling new devices capable of handling demanding applications, most of these improvements in efficiency and performance are prevalent from the switch to the more advanced yet more challenging jump to the 3 nm node. As Arm continues to push the boundaries of what's possible with its IP, these technologies should pave the way for more powerful, efficient, and intelligent devices, shaping the future of what's possible and capable from a mobile device, whether that be in terms of the new generation of AI capable devices, or mobile gaming, Arm is looking to offer it all.

Arm Cortex A520: Same 2023 Core Optimized For 3nm


View All Comments

  • mode_13h - Thursday, May 30, 2024 - link

    > Also, amazing increases in performance per watt doesn't mean less power draw.

    ARM provided power/performance curves, the point of which is to show how much more efficient the new cores can be at ISO performance, or how much more performance you can get at the same power, or what tradeoffs you can make anywhere in between.

    I know their unitless graphs and lack of details about the workload used to produce them can stretch their credibility, but it's not as if they aren't aware that these cores often won't be clocked to the max.
  • vegemeister - Friday, May 31, 2024 - link

    In most client application, you always do 1x the work, and the only difference is how long it takes / what the CPU utilization % is.

    So the SoC will indeed use 1/1.33 as much energy.
  • eastcoast_pete - Sunday, June 2, 2024 - link

    It does overall if the OS and the SoC does "hurry up and get to idle" really well. This is something Apple's mobile SoCs have excelled at in recent times, it helps that their "Little" (efficiency) cores are strong performers that use out-of-order execution and other features to allow the SoC to stay on the efficiency cores for far longer. Android smartphones based on stock ARM cores don't have that option as much, and seem to end up running their larger cores more often and longer. Would also be interesting how much of that efficiency penalty can also be attributed to Android OS, but ARM has been very stubborn sticking to in-order execution for its Little cores. Which is puzzling, but good for Apple. Reply
  • mode_13h - Thursday, May 30, 2024 - link

    I'm disappointed that ARM seems to have deviated from their practice of releasing ISO-power and ISO-performance figures.

    Also, I noticed they swapped the axis' in their power/performance graph so that it curves upwards rather than leveling off. I guess some marketing goon decided graphs look more impressive if they curve upward. And, as usual, we get the unitless graphs that don't start at zero.

    Hey, does anyone know if the A520 still potentially shares vector FP units between a pair of cores, or did that gem of an idea begin and end with the A510?
  • GeoffreyA - Thursday, May 30, 2024 - link

    "shares vector FP units between a pair of cores"

    That was a Bulldozer principle, if I remember rightly.
  • kkilobyte - Thursday, May 30, 2024 - link

    Ok, sorry to remind you and maybe sound a little 'pushy' about it, but what about the i9-14900KS test redo with Intel Default settings? You told us 20 days ago that you'd redo them :

    Gavin Bonshor - Friday, May 10, 2024 - link
    Don't worry; I will be testing Intel Default settings, too. I'm testing over the weekend and adding them in.

    So, will this promise be ever fullfilled?
  • mode_13h - Thursday, May 30, 2024 - link


    Please deliver the promised update to the i9-14900KS review! The people deserve to know how much performance is being lost with Intel's new recommended defaults!
  • watersb - Friday, May 31, 2024 - link

    Pronunciation of their new software branding, 'Kleidi', is not completely clear to me.

    One way is to make it sound like the name of a girl, rhymes with 'Heidi'. So a single syllable.

    The other way is to infer that the Arm Marketing people wished to evoke a colorful collection of myriad bits that can combine to form interesting patterns. A toy, A Kaleidoscope.

    Unfortunately, that sounds like "Collide-y" to me: a product that tends to bang into other pieces.

    Which would be an unfortunate name for automotive applications.
  • EthiaW - Sunday, June 2, 2024 - link

    ARM routinely claims the fruit of TSMC node improvement as its own achievement, you'll get familiar with the cliche after following it for a few years.🙄 As for competition, we are already seeing Apple 9-core M4, the 1.5 times bigger brother of A18. Halve the memory score and lower its frequency to a more phone-friendly 3.7Ghz, it's still scoring at least 3200 in geekbench single core which X925 is certainly not going to catch up with. By the time new ARM laptop hits the market Zen5 mobile and Lunar Lake will be prevalent and based on available data an single core improvement of at least 20% is expected so X925 will not have an easy time. I'd say this generation of ARM cores are mostly incremental and nothing revolutional. Reply
  • mode_13h - Sunday, June 2, 2024 - link

    > ARM routinely claims the fruit of TSMC node improvement as its own achievement,

    There's nothing automatic about an IPC improvement. You have to actually make design changes to take advantage of the larger transistor budget and timing margins, in order to achieve that. Otherwise, the only way CPUs would get faster from shrinking nodes is just by increasing clockspeeds, which incurs a high cost in additional power.

    Plus, how is this any different than what Intel and AMD do, when they announce new CPU microarchitectures? They don't usually separate out how much improvement is from the node, if ever.

    > Zen5 mobile and Lunar Lake will be prevalent and based on available data
    > an single core improvement of at least 20% is expected

    At what power level? It's not 20% IPC, so there's some additional clockspeed in that figure, which might not be entirely applicable to laptops.

Log in

Don't have an account? Sign up now