Closing Remarks: Pushing Forward on 3 nm For 2024

Having attended Arm's Client Technology Day, my initial impressions were that Arm has opted to refine and hone its IP for 2024 instead of completely redefining and making groundbreaking changes. Following on from last year's introduction of the Armv9.2 family of cores, Arm has made some notable changes within the architecture of the latest Cortex series for 2024, with a clear and intended switch to the more advanced 3 nm process node, both with Samsung and TSMC 3 nm as the basis of client-based CSS for the 2024 platform.

The Cortex-X925, Cortex-A725, and Cortex-A520 cores have been optimized for the 3 nm process, delivering significantly touted performance and power efficiency improvements. The Cortex-X925, with its enhanced 10-wide decode and dispatch width and higher clock speeds reaching up to 3.8 GHz, looks to set a new standard for single-threaded IPC performance. Arm's updated v9.2 platform looks ideal for high-performance applications, including AI workloads and high-end gaming, both in the mobile space and with Microsoft's Windows on Arm ecosystem.

In the grand scheme of things, and from Arm's in-house performance comparisons between the new CSS platform and last year's TCS2023 version, Arm claims gains of between 30 and 60% in performance, depending on the task and workload. If it is to be believed and taken as gospel, the performance improvements are incredible, with the likely transition to 3 nm being the primary improver of performance rather than the underlying architectural improvements.

The Cortex-A725 balances performance and efficiency, making it suitable for several mid-range devices. Thanks to architectural enhancements such as increased cache sizes and expanded reorder buffers, Arm claims the improvements achieve up to 35% performance efficiency over the previous generation. The refreshed Cortex-A520 focuses primarily on being optimized on the 3 nm node while looking to remain unmatched in power efficiency, achieving a 15% energy saving compared to its predecessor. This core is optimized for low-intensity workloads, making it ideal for power-sensitive applications like IoT devices and lower-cost smartphones.

AI capabilities have been a significant focus in Arm's latest offerings. The Cortex-X925 and Cortex-A725 cores primarily integrate dedicated AI accelerators, allowing access to optimized software libraries, such as KleidiAI and KleidiCV, ensuring efficient AI processing. These enhancements are crucial for applications ranging from neural language models and LLMs.

Arm also continues to support its latest Core Cluster with a usually adept and comprehensive ecosystem driven by the new CSS platform, coupled with the Arm Performance Studio and in tandem with the Kleidi AI and CV libraries. These provided tools give developers a robust foundation to fully leverage the new architecture's capabilities. This effectively reduces the overall time-to-market and fosters innovations across various industries, such as content creation and on-device AI inferencing. The CSS platform's integration with operating systems such as Android, Linux, and Windows (Windows on Arm) ensures a larger reach in adoption. It pushes a wider level of development, making software and applications available on more devices than in previous generations. 

In summary, Arm's move to all its latest CPU designs onto the 3 nm process technology and the refinements in the Cortex-X925 and Cortex-A725 cores demonstrate a strategic focus on optimizing existing architectures rather than making radical changes. These refinements include increased cache sizes per core, moving to a wider pipeline, and bolstering the DSU-120 Core Cluster for 2024, which certainly delivers substantial performance and power efficiency gains on paper.

While enabling new devices capable of handling demanding applications, most of these improvements in efficiency and performance are prevalent from the switch to the more advanced yet more challenging jump to the 3 nm node. As Arm continues to push the boundaries of what's possible with its IP, these technologies should pave the way for more powerful, efficient, and intelligent devices, shaping the future of what's possible and capable from a mobile device, whether that be in terms of the new generation of AI capable devices, or mobile gaming, Arm is looking to offer it all.

Arm Cortex A520: Same 2023 Core Optimized For 3nm
POST A COMMENT

55 Comments

View All Comments

  • StormyParis - Wednesday, May 29, 2024 - link

    Do these processors have anything to prevent exploits such as RowHammer etc ... ? Those & variants have been a big story, then disappeared, but we were never told about an actual solution ? Reply
  • GeoffreyA - Wednesday, May 29, 2024 - link

    Are these companies so lame with their AI desperation? Reply
  • abufrejoval - Wednesday, May 29, 2024 - link

    Memory tagging extensions have been around since ARM 8.5. When you say ARM 9.2 MTE, does that mean they have been significantly upgraded e.g. in the direction of what CHERI does for RISC-V?

    I've been trying to find out if ARM has an "AVX-512" issue with their different big/middle/small designs, too. That is if these distinct cores might actually differ in the range of instruction set extensions they support. And I can't get a clear picture, either.

    So if say the big cores support some clever new vector formats for AI and the middle or small cores won't, how will apps and OS deal with the issue?
    Reply
  • GeoffreyA - Wednesday, May 29, 2024 - link

    I don't know enough about ARM to comment, but should think that there are compatibility issues, with instructions, spanning different models and generations. Perhaps there's a feature-level type of method? Reply
  • Findecanor - Wednesday, May 29, 2024 - link

    ARM MTE is much cruder than CHERI. It can be described as "memory colouring": Every allocation in memory is tagged with one of 16 colours. Two adjacent allocations can't have the same colour. When you use a pointer the colour bits in otherwise unused top bits of the pointer have to match the colour of the allocation it points into.

    With SVE both E and P cores need to have the same vector length, yes. The vector length is usually no larger than a cache line which have to be the same size anyway.

    I don't know specifically about SME but many extensions have to first be enabled by the OS on a core to be available to user-mode programs. If not all cores have an extension, the OS may choose to not enable it on any.
    Reply
  • mode_13h - Thursday, May 30, 2024 - link

    > The vector length is usually no larger than a cache line which have to be the same size anyway.

    Cache lines are usually 64 bytes, which is 512 bits. Presumably, the A520 has only SVE2 @ 128 bits. So, don't let ARM off the hook *that* easily!
    Reply
  • eastcoast_pete - Wednesday, May 29, 2024 - link

    That is very much something I am wondering, too. Reports/rumors have it that, for example, Qualcomm chose not to enable SVE in the big cores of their SD 8 Gen3. Qualcomm isn't exactly forthcoming with information about that, not that I would expect them to comment. Reply
  • name99 - Wednesday, May 29, 2024 - link

    That's a silly response. It's like being present at the birth of Mac or Windows and saying "why are these stupid hardware companies trying so hard to make their chips run graphics fast?"

    The hope of LLMs is that they will provide a substantial augmentation to existing UI. So instead of having to understand a complicated set of Photoshop commands, you'll be able to say something like "Highlight the subject of the photo. Now move it about an inch left. Now remove that power line in the background".
    This is not a trivial task; it requires substantial replumbing on existing apps, along with a fair degree of rethinking app architecture. Well, no-one said it would be easy to convert Visicalc to Excel...

    But that is where things are headed. And because ARM (and Apple, and QC, and MS) are not controlled by idiots who think only in terms of tweets and snark, each of these companies is moving heaven and earth to ensure that they will not be irrelevant during this shift.

    (Oh, you thought the entire world consisted of LLMs answering questions did you? Strange how the QUESTION-ANSWERING COMPANY, ie Google, has created that impression...
    Try thinking independently for once. Everything in the world happens along a dozen dimensions at once. All it takes to be a genius is to be able to hold *more than one* dimension in your head simultaneously.)
    Reply
  • FunBunny2 - Wednesday, May 29, 2024 - link

    Well, no-one said it would be easy to convert Visicalc to Excel..

    well... Mitch did it, in assembler at first, and called it Lotus 1-2-3
    Reply
  • GeoffreyA - Thursday, May 30, 2024 - link

    You can go on with your ad hominen and air of superiority; it won't change that these companies are tripping over themselves, in insecurity and desperation, to grab dollars from the AI pot or not be left behind.

    You make assumptions about my ideas based on one sentence. In fact, AI is quite interesting to me. Not how it's going to help someone in Photoshop or Visual Studio, but where LLMs eventually lead; whether they end up being the language faculty in strong AI, or more; what's missing from today's LLMs (state, being trained in real-time, connecting to the sense modalities, using little power in a small space, etc.); and whether consciousness, the last great riddle, will ever be solved, how, and the moral implications. That's of interest to me.

    But when one see Microsoft and Intel making an "AI PC," or AMD calling their CPU "Ryzen AI," and so on, it is little about true AI and more about money, checklists, and the bandwagon. Independence of thought is about seeing past fashion and the in-thing. And no thank you: I have got no desire, nor the insecurity, to want to be a genius.
    Reply

Log in

Don't have an account? Sign up now