NVIDIA Details DRIVE AGX Orin: A Herculean Arm Automotive SoC For 2022by Ryan Smith on December 18, 2019 8:30 AM EST
While NVIDIA’s SoC efforts haven’t gone entirely to plan since the company first started on them over a decade ago, NVIDIA has been able to find a niche that works in the automotive field. Backing the company’s powerful DRIVE hardware, these SoCs have become increasingly specialized as the DRIVE platform itself evolves to meet the needs of the slowly maturing market for the brains behind self-driving cars. And now, NVIDIA’s family of automotive SoCs is growing once again, with the formal unveiling of the Orin SoC.
First outlined as part of NVIDIA’s DRIVE roadmap at GTC 2018, NVIDIA CEO Jensen Huang took the stage at GTC China this morning to properly introduce the chip that will be powering the next generation of the DRIVE platform. Officially dubbed the NVIDIA DRIVE AGX Orin, the new chip will eventually succeed NVIDIA’s currently shipping Xavier SoC, which has been available for about the last year now. In fact, as has been the case with previous NVIDIA DRIVE unveils, NVIDIA is announcing the chip well in advance: the company isn't expecting the chip to be fully ready for automakers until 2022.
What lies beneath Orin then is a lot of hardware, with NVIDIA going into some high-level details on certain parts, but skimming over others. Overall, Orin is a 17 billion transistor chip, almost double the transistor count of Xavier and continuing the trend of very large, very powerful automotive SoCs. NVIDIA is not disclosing the manufacturing process being used at this time, but given their timeframe, some sort of 7nm or 5nm process (or derivative) is pretty much a given. And NVIDIA will definitely need a smaller manufacturing process – to put things in comparison, the company’s top-end Turing GPU, TU102, takes up 754mm2 for 18.6B transistors, so Orin will pack in almost as many transistors as one of NVIDIA’s best GPUs today.
|NVIDIA ARM SoC Specification Comparison|
|CPU Cores||12x Arm "Hercules"||8x NVIDIA Custom ARM "Carmel"||2x NVIDIA Denver +
4x Arm Cortex-A57
|GPU Cores||"Next-Generation" NVIDIA iGPU||Xavier Volta iGPU
(512 CUDA Cores)
|Parker Pascal iGPU
(256 CUDA Cores)
|INT8 DL TOPS||200 TOPS||30 TOPS||N/A|
|FP32 TFLOPS||?||1.3 TFLOPs||0.7 TFLOPs|
|Manufacturing Process||7nm?||TSMC 12nm FFN||TSMC 16nm FinFET|
Those transistors, in turn, will be driving several elements. Surprisingly, for today’s announcement NVIDIA has confirmed what CPU core they’ll be using. And even more surprisingly, it isn’t theirs. After flirting with both Arm and NVIDIA-designed CPU cores for several years now, NVIDIA has seemingly settled down with Arm. Orin will include a dozen of Arm’s upcoming Hercules CPU cores, which are from Arm’s client device line of CPU cores. Hercules, in turn, succeeds today’s Cortex-A77 CPU cores, with customers recently receiving the first IP for the core. For the moment we have very little information on Hercules itself, but Arm has previously disclosed that it will be a further refinement of the A76/A77 cores.
I won’t spend too much time dwelling on NVIDIA’s decision to go with Arm’s Cortex-A cores after using their own CPU cores for their last couple of SoCs, but it’s consistent with the direction we’ve seen most of Arm’s other high-end customers take. Developing a fast, high-performance CPU core only gets harder and harder every generation. And with Arm taking a serious stab at the subject, there’s a lot of sense in backing Arm’s efforts by licensing their cores as opposed to investing even more money in further improving NVIDIA’s Project Denver-based designs. It does remove one area where NVIDIA could make a unique offering, but on the flip side it does mean they can focus more on their GPU and accelerator efforts.
Speaking of GPUs, Jensen revealed very little about the GPU technology that Orin will integrate. Besides confirming that it’s a “next generation” architecture that offers all of the CUDA core and tensor functionality that NVIDIA has become known for, nothing else was stated. This isn’t wholly surprising since NVIDIA hasn’t disclosed anything about their forthcoming GPU architectures – we haven’t seen a roadmap there in a while – but it means the GPU side is a bit of a blank slate. Given the large gap between now and Orin’s launch, it’s not even clear if the architecture will be NVIDIA’s next immediate GPU architecture or the one after that, however given how Xavier’s development went and the extensive validation required for automotive, NVIDIA’s 2020(ish) GPU architecture seems like a safe bet.
Meanwhile NVIDIA’s Deep Learning Accelerator (DLA) blocks will also be making a return. These blocks don’t get too much attention since they’re unique to NVIDIA’s DRIVE SoCs, but these are hardware blocks to further offload neural network inference, above and beyond what NVIDIA’s tensor cores already do. On the programmable/fixed-function scale they’re closer to the latter, with the task-specific hardware being a good fit for the power and energy-efficiency needs NVIDIA is shooting for.
All told, NVIDIA expects Orin to deliver 7x the 30 INT8 TOPS performance of Xavier, with the combination of the GPU and DLA pushing 200 TOPS. It goes without saying that NVIDIA is still heavily invested in neural networks as the solution to self-driving systems, so they are similarly heavily investing in hardware to execute those neural nets.
Rounding out the Orin package, NVIDIA’s announcement also confirms that the chip will offer plenty of hardware for supporting features. The chip will offer 4x 10 Gigabit Ethernet hosts for sensors and in-vehicle communication, and while the company hasn’t disclosed how many camera inputs the SoC can field, it will offer 4Kp60 video stream encoding and 8Kp30 decoding for H.264/HEVC/VP9. The company has also set a goal for 200GB/sec of memory bandwidth. Given the timeframe for Orin and what NVIDIA does for Xavier today, an 256-bit memory bus with LPDDR5 support sounds like a shoe-in, but of course this remains to be confirmed.
Finally, while NVIDIA hasn’t disclosed any official figures for power consumption, it’s clear that overall power usage is going up relative to Xavier. While Orin is expected to be 7x faster than Xavier, NVIDIA is only claiming it’s 3x as power efficient. Assuming NVIDIA is basing all of this on INT8 TOPS as they usually do, then the 1 TOPS/Watt Xavier would be replaced by the 3 TOPS/Watt Orin, putting the 200 TOPS chip at around 65-70 Watts. Which is admittedly still fairly low for a single chip at a company that sells 400 Watt GPUs, but it could add up if NVIDIA builds another multi-processor board like the DRIVE Pegasus.
Overall, NVIDIA certainly has some lofty expectations for Orin. Like Xavier before it, NVIDIA intends for various forms of Orin to power everything from level 2 autonomous cars right up to full self-driving level 5 systems. And, of course, it will do so while being able to provide the necessary ASIL-D level system integrity that will be expected for self-driving cars.
But as always, NVIDIA is far from the only silicon vendor with such lofty goals. The company will be competing with a number of other companies all providing their own silicon for self-driving cars – ranging from start-ups to the likes of Intel – and while Orin will be a big step forward in single-chip performance for the company, it’s still very much the early days for the market as a whole. So NVIDIA has their work cut out for them across hardware, software, and customer relations.
Post Your CommentPlease log in or sign up to comment.
View All Comments
Raqia - Wednesday, December 18, 2019 - linkIt looks like nVidia is also starting to leave the custom ARM v8 core space in favor of ARM's designs.
Yojimbo - Wednesday, December 18, 2019 - linkProbably, although there is always the possibility they moved their ARM design team from designing SoC cores to Server chip cores. They are in the process of porting their entire software stack to have full ARM support in order to target edge servers such as 5G radio area networks. It's possible with the addition of Mellanox they want to offer most of the high margin hardware for that space: the CPU, the GPU, and the interconnect. Just a possibility...
blazeoptimus - Wednesday, December 18, 2019 - linkWhat I'm most interested to see, is how this will play into Nintendo's future device strategy. While I think choosing the Tegra X1 for their current console made sense, I see this as causing them long term issues. I feel fairly certain Nvidia game them a discount on the X1 since it was designed for mobile devices, but had no takers. As Nvida has designed subsequent generations of Tegra chips, they've moved further and further away from chips that would work well in a future mobile console. I fear that this puts Nintendo in a situation similar to where they were at with the WiiU - working with a vendor that will do the bare minimum to increase performance - because its no longer in there interests to develop the line. As their technology falls further behind, they get into an untenable position. The WiiU used processors designed in the late 90s, and I firmly believe it was part of the consoles failure. I don't believe you have to be the fastest, but I do believe you need to be current.
Raqia - Wednesday, December 18, 2019 - linkThe PowerPC 750 derivative they used is a sexy little beast and a poster child of efficiency for its time: it had OoOE, had branch prediction, caches and came in at under 7 million transistors! I think it was about as simple as it could have been for the features it had which I think is great.
Nintendo is always leading with more interesting interfaces and form factors rather than processing horse power. With that in mind, I think Qualcomm would actually be the most natural partner for them to work with for their next gen. console given their investment into VR and AR. They could probably get a deal on some long in the tooth 835 derivatives and release a proper Virtual Boy 2... :)
Raqia - Wednesday, December 18, 2019 - linkTo sing more praises of the 750, the radiation hardened versions (RAD750) are commonly found on space probes like the Curiosity rover. The part is well proven and loved for a reason.
blazeoptimus - Thursday, December 19, 2019 - linkI don't disagree that the 750 was a good chip for its time. I'm also aware of the RAD750. I'm also aware that it was used because 1. The specs for the Curiosity rover were finalized years before its launch (testing began in the 2004-2005 timeframe) putting it much closer to the original PPC750 release. And 2. Developing a new radiation hardened chip is not economically advantageous for the companies that sell them (complex tooling and design process for literally just a few chips). They are finally designing a replacement for the RAD750 specifically because its so old.
As to it being the poster child of efficiency, I think you nailed it on the head when you said 'for its time', which was 1998. the wii U was released in 2012. By that time there were processors, from a myriad of vendors and price points, that exceeded the WiiU's triple core PPC750 (which was never designed for multi processors). They were bolting on features to outdated architecture to try to eeak out enough performance to not bottelneck their low end graphics solution, and they were making compromises to do it. I love classic computing and I actually think it was pretty cool that they were able to take the 750 that far - but I don't think it was a wise business decision.
As to using Qualcomm, I don't think its a good fit. Qualcomm tailors there chip designs for the much higher volume of mobile phone sales. At lease with the switch, they can claim that at the time of the switches original release that it was competitive with the other mobile platforms graphics wise. They wouldn't be able to do that with a Qualcomm solution. Long term, if AMD were more committed to the ARM platform, they would be an ideal choice. They have quite a bit of experience with Semi-Custom and can provide top-tier graphics capabilities. Or perhaps I'm wrong about Nvidia and an adapted Orin chip would do just the trick. I suppose we'll see. I just know that historically, Nintendo has failed to capitalize on newer technology in the way that they should. I agree that they primarily rely on gaming experience, but at some point the hardware needs to move forward as well. You could easily make the N64 into a mobile gaming device and add a couple of nun-chuck controllers - but that doesn't mean that people would buy it.
Yojimbo - Friday, December 20, 2019 - linkThe Orin chip is huge and expensive. No way it finds its way anywhere near a Switch or Switch follow-on.
Alistair - Sunday, March 21, 2021 - linknot true, it is coming out two years later, most likely on 8nm or 5nm EUV, it won't be that large 2 years later, just twice as many GPU cores as we already have in Xavier
levizx - Sunday, December 22, 2019 - linkQualcomm still have the likes of 8cx and it's successors. And they did a customized version for Microsoft. I don't see why they can't/won't do it again for Nintendo.
blazeoptimus - Thursday, December 19, 2019 - linkAs an additional counterpoint, I could sing the praises of the Z80 which still has modern incarnations in production and use. Its likewise a model of efficiency at utilizes only about 500k Transistors (thats a rough guess) :). It was also used in Video Game consoles. All that being said, I don't think it'd be a good decision to use it as a consoles primary compute processor.