Anybody following the industry over the last decade will have heard of Arm. We best know the company for being the enabler and providing the architecture as well as CPU designs that power essentially all of today’s mobile devices. The last 7-5 years in particular we’ve seen meteoric advances in silicon performance of the mobile SoCs found in our smartphones and tablets.

However Arm's ambition goes widely beyond just mobile and embedded devices. The market for compute in general is a lot larger than that, and looking at things in a business sense, high-end devices like servers and related infrastructure carry far greater profit margins. So for a successful CPU designer like Arm who is still on the rise, it's a very lucrative market to aim for, as current leader Intel can profess.

To that end, while Arm has been wildly successful in mobile and embedded, anything requiring more performance has to date been out of reach or has come with significant drawbacks. Over the last decade we’ve heard of numerous prophecies how products based on the architecture will take the server and infrastructure market by storm “any moment now”. In the last couple of years in particular  we’ve seen various vendors attempt to bring this goal to fruition: Unfortunately, the results of the first generation of products were less than successful, and as such, even though some did better than others, the Arm server ecosystem has seen a quite a bit of hardship in its first years.

A New Focus On Performance

While Arm has been successful in mobile for quite some time, the overall performance of their designs has often left something to be desired. As a result, the company has been undertaking a new focus on performance that is spanning everything from mobile to servers. Working towards this goal, 2018 was an important year for Arm as the company had introduced its brand-new Cortex A76 microarchitecture design: Representing a clean-sheet endeavor, learning from the experience gained in previous generations, the company has put high hopes in the brand-new Austin-family of microarchitectures. In fact, Arm is so confident on its upcoming designs that the company has publicly shared its client compute CPU roadmap through 2020 and proclaiming it will take Intel head on in PC laptop space.

While we’ll have to wait a bit longer for products such as the Snapdragon 8CX to come to market, we’ve already had our hands on the first mobile devices with the Cortex A76, and very much independently verified all of Arm’s performance and efficiency claims.

And then of course, there's Neoverse, the star of today's Arm announcements. With Neoverse Arm is looking to do for servers and infrastructure what it's already doing for its mobile business, by greatly ramping up their performance and improving their competitiveness with a new generation of processor designs. We'll get into Neoverse in much deeper detail in a moment, but in context, it's one piece of a much larger effort for Arm.

All of these new microarchitectures are important to Arm because they represent an inflection point in the market: Performance is now nearing that of the high-end players such as Intel and AMD, and Arm is confident in its ability to sustain significant annual improvements of 25-30% - vastly exceeding the rate at which the incumbent vendors are able to iterate.

The Server Inflection Point: An Eventful Last Few Months Indeed

The last couple of months have been quite exciting for the Arm server ecosystem. At last year’s Hotchips we’ve covered Fujitsu’s session of their brand-new A64FX HPC (High performance compute) processor, representing not only the company shift from SPARC to ARMv8, but also delivering the first chip to implement the new SVE (Scalable Vector Extensions) addition to the Arm architecture.

Cavium’s ThunderX2 saw some very impressive performance leaps, making its new processor among the first to be able to compete with Intel and AMD – with partners such as GIGABYTE offering whole server systems solutions based on the new SoC.

Most recently, we saw Huawei unveiled their new Kunspeng 920 server chip promising to be the industry’s highest performing Arm server CPU.

The big commonality between the above mentioned three products is the fact that each represents individual vendor’s efforts at implementing a custom microarchitecture based on an ARMv8 architectural license. This in fact begs the question: what are Arm’s own plans for the server and infrastructure market? Well for those following closely, today’s coverage of the new Neoverse line-up shouldn’t come as a complete surprise as the company had first announced the branding and road-map back in October.

Introducing the Neoverse N1 & E1 platforms: Enabling the Ecosystem

Today’s announcement is all about enabling the ecosystem; we’ll be covering in more detail two new “platforms” that will be at the core of Arm’s infrastructure strategy for the next few years, the Neoverse N1 and E1 platforms:

Particularly today’s announcement of the Neoverse N1 platform sheds light onto what Arm had teased back in the initial October release, detailing what exactly “Ares” is and how the server/infrastructure counter-part to the Cortex A76 µarchitecture will be bringing major performance boosts to the Arm infrastructure ecosystem.

The Neoverse N1 CPU: No-Compromise Performance
POST A COMMENT

102 Comments

View All Comments

  • Antony Newman - Thursday, February 21, 2019 - link

    (Arbitrary example)

    If a SoC can run at 5GHz when 8 cores single core, but throttles down to 2.5GHz when 16 cores are active - then it cannot scale (due to the TDP limit).

    If ARM are designing their CPUs so that 128 (ie all) of them can run flat out without requiring throttling, then ARMs single core performance is indicative of the overall performance.

    If ARM increase their single core performance by 1.7 times in two years - and keep this same MO (of no Throttling cause to keep within the TDP) - it will be more than just data centres that want to buy into this new architecture.

    AJ
    Reply
  • wumpus - Thursday, February 21, 2019 - link

    Very few problems scale without penalty. Having high single core performance (for each core in a multichip server CPU, obviously. The Intel result using all of its cache on one core is obviously irrelevant and why it was so anomalous vs. AMD) means far less cores are needed, when scaling up. Also adding more and more cores require as much cache or more. If not, your bandwidth will scale even worse.

    Single core is absolutely critical for servers, and why it is taking ARM so long to break in. IBM is the exception that proves the rule: but they rely on weird licensing rules and making sure all the threads can access the same cache.
    Reply
  • eastcoast_pete - Thursday, February 21, 2019 - link

    I actually think we are in agreement. While this borders on semantics, per core performance is, of course, very important for servers, while high single (one) core is not. As you point out, Intel getting really high one core performance from a 18 core Xeon by running a strictly single core/thread test while allocating all the cache and much of the thermal envelope to that one core is an artificial situation for a server. Reply
  • The_Assimilator - Wednesday, February 20, 2019 - link

    Remember when "system on chip" meant IO too? Apparently Arm doesn't.

    Remember when Arm chips didn't need HSFs to run? Pepperidge Farm remembers.

    I'm going to enjoy it when this, like all of Arm's previous attempts at the high-end, fails once again. Or when Lakefield eats Arm's lunch, whichever comes first.
    Reply
  • wumpus - Wednesday, February 20, 2019 - link

    When your volume is 1400 chips (not all the same design) over 4 years, you use FPGA for anything you can. Doing anything else is pretty dumb. I'm surprised they bothered with an actual layout, but I suspect that they've been bitten by tiny details in FPGA simulation that never quite worked the same at speed.

    HSF? You want the MIPS, you burn the Watts. Presumably this is your "tell" in your troll.

    When has ARM made a previous attempt at the high-end? Certainly more than a few of their architectural licensees have, but there's a huge difference between a server architecture backed by ARM and even one backed by Qualcom. For one thing, they pretty much need to standardize remote adminstration to Intel levels (possibly circa ~2008ish to get off the ground). That's a lot of pesky little details, but something they absolutely need standardized to allow server use in the datacenter (yes, the Big Boys can roll their own, but everybody else needs a common server definition.
    Reply
  • Antony Newman - Wednesday, February 20, 2019 - link

    Fascinating article.

    Do you think Ampere, Huawei, Cavium and Amazon will all switch to the Neoverse?

    In terms of IPC - do you have a view on if ARM have Caught up with Apples Vortex yet?

    Is there any reason why a mobile phone (or Tablet) maker would’t use the ARM ‘server’ chip in a fondleslab?

    AJ
    Reply
  • ballsystemlord - Wednesday, February 20, 2019 - link

    Spelling and grammar corrections:
    ...the actual real-life performance improvements will higher due other SoC-level improvements as well as software improvements that aren't available in existing actual A72 silicon products.
    Missing be:
    ...the actual real-life performance improvements will behigher due other SoC-level improvements as well as software improvements that aren't available in existing actual A72 silicon products.

    The figured weren't run actual silicon but rather estimated on Arm's server farm in an emulation environment with RTL.
    Miswritten sentence:
    The figures weren't calculated on actual silicon but rather estimated on Arm's server farm in an emulation environment with RTL.

    The E1's CPU pipeline actually represents a brand new-design which (besides the A65) haven't seen employed before.
    Missing we:
    The E1's CPU pipeline actually represents a brand new-design which (besides the A65) we haven't seen employed before.

    Here we have to clusters of 8 cores in a small CMN-600 2x4 mesh network, ...
    Wrong 2:
    Here we have two clusters of 8 cores in a small CMN-600 2x4 mesh network, ...

    I was half asleep when I read it so there might be more.
    Reply
  • sohntech43 - Wednesday, February 20, 2019 - link

    Could someone help me understand why the Spec CPU2006 results are so different from those recorded for the AMD 7601 (1000 - 1200 vs. 690.63) and Xeon Platinum results (1300+ vs 730) in the Spec data base?

    https://www.spec.org/cpu2006/results/cpu2006.html

    They are also different from what AMD was boasting at the time of the original EPYC launch:

    https://www.microway.com/download/whitepaper/AMD-E...

    I'm probably missing something obvious...
    Reply
  • Wilco1 - Wednesday, February 20, 2019 - link

    Yes you're missing the fact these are GCC8 scores using -Ofast as mentioned in the article - ie. like when you build code yourself.

    Official SPEC scores are quite different and use special trick compilers to get the highest score. For example libquantum shows a completely unrealistic result in most SPEC submissions which artificially inflates the integer score by 30+%.
    Reply
  • sohntech43 - Wednesday, February 20, 2019 - link

    Thanks - was surprised by the sheer magnitude of the delta caused by the compilers. Impressive results for N1 and will be interesting to see when silicon is available. Reply

Log in

Don't have an account? Sign up now