Intel's new Atom Microarchitecture: The Tremont Core in Lakefieldby Dr. Ian Cutress on October 24, 2019 1:30 PM EST
While Intel has been discussing a lot about its mainstream Core microarchitecture, it can become easy to forget that its lower power Atom designs are still prevalent in many commercial verticals. Last year at Intel’s Architecture Summit, the company unveiled an extended roadmap showing the next three generations of Atom following Goldmont Plus: Tremont, Gracemont, and ‘Future Mont’. Tremont is set to be launched this year, coming first in a low powered hybrid x86 design called Lakefield for notebooks, and using a new stacking technology called Foveros built on 10+ nm. At the Linley Processor Conference today, Intel unveiled more about the microarchitecture behind Tremont.
For the sake of clarity, a pre-note on ‘Core’ vs ‘core’:
- ‘Core’ and ‘Atom’ are Intel’s two main x86 microarchitecture families
- A ‘core’ is a single designated CPU capable of processing instructions, and can be built by Intel with either ‘Core’ or ‘Atom’ microarchitectures
A Brief History of Atom
Intel’s lower powered Atom microarchitecture has been used for a variety of solutions: embedded platforms, networking, smartphones, tablets, netbooks, NAS devices, control hubs, and a wide array of things we don’t even know about. The positioning of Atom compared to Core was meant to be that Atom was the smaller core design, taking up less silicon die area and being lower performance, but ultimately lower power in a time where the Core microarchitecture was focused more towards high performance designs.
The last few generations of Atom are readily quantified: Silvermont based on 22nm was a big product for the company, which has evolved into Airmont, Goldmont, Goldmont Plus, and now Tremont.
|Intel's Atom History|
|Clover Trail||Cedar Trail|
|Bay Trail-T||Bay Trail-M
The Atom family lines get a little confusing with Intel playing in all these spaces. The Atom core within in given family is usually identical (L2 configuration might change), and because of the SoC in play, it might get a different name based on the market where it was headed. Intel scrapped the smartphone program back with Broxton in 2016, and the tablet type of SoC has also gone away. With Lakefield, combining Core and Atom, it could be used in Tablets again for 2019/2020, but we will see it in Notebooks with the Surface Pro Neo and in networking/embedded markets as Snow Ridge.
Lakefield - 12mm x 12mm, 2mW Standby Power
It is worth noting that as Intel expanded the scope of its Core microarchitecture, from 1.5W per core to 20W+ per core, it has kind of edged Atom more into niche products. Atom still had that super-low-power advantage, with a much smaller die area, but has also been super low performance with a quantifiable step-function below what Core can provide. With Tremont, Intel’s primary focus was bringing the single thread performance of the Atom design in parity to Core at the lower end of performance, with a sizeable overlap between the performance of a single Core design against a single Atom design. Intel published this graph to demonstrate what this looks like on early silicon:
Now, Intel’s Atom platforms haven’t had the greatest press over the last few years. Aside from providing some really nice notebooks around the $200 range on the consumer side, the enterprise side has been dealing with a clock degradation issue that ultimately leaves Atom systems built on C2000 processors unable to boot, which was bad news for embedded Atom systems designed to run for 10-20 years. Intel has since fixed that bug with a silicon update, but the point of that silicon was for it not to be touched for a generation.
With that aside, Intel is looking to revive its Atom fortunes with the new Tremont design, and looking forward to Gracemont and beyond. More performance, crossing over with Core, and with hardware built on Intel’s latest 10+ process, should afford a number of opportunities. Until we get our hands on the hardware, we’re going to examine the design.
Design Goals for Tremont
The odd quirk about CPU design is that for engineers that have been embedded in this space for 20 years, when they were taught about processor design, the main focus was all about performance. Little attention was paid to power. Fast forward to today, and power is the often talked about point when it comes to battery powered devices, and learning to design for both performance and power becomes an intense balancing act for all the engineers involved. We’ve spoken to companies that only allow performance enhancements if the power increase is at most equal in percentage, or perhaps a 2:1 ratio of performance/power. It’s a difficult pie to bake at any rate.
The interesting thing here in our briefing with Intel is that they specifically stated that Tremont was built with performance in mind, and the aim was for a sizeable uptick in the raw clock-for-clock throughput compared to the previous generation Atom, Goldmont Plus. Based on Intel’s own metrics, namely using SPEC, Intel is going to claim an average 30% iso-frequency performance uplift in core performance for Tremont over Goldmont Plus.
It’s worth noting here that this data is from an early Tremont design we were told, and should represent minimum uplifts. The graph is somewhat skewed at the top end with three of the SPEC tests getting 65%+ uplifts, and at the time of discussion, Intel did not have to hand exactly which tests these were (likely libquantum, lbm). We weren’t told how the code was compiled, however Intel did state that the same compiled binaries were used on both Tremont and Goldmont Plus. Intel didn’t state if they’re actually adjusting the clock of each core to match each other, or doing a performance per clock analysis using the frequency as a division factor. These results have to be taken at face value.
A 30% average jump in performance is a sizeable jump for any generation-to-generation cadence. Just taking it as-is feels premature: aside from microarchitectural advancements and a jump to 10nm, there has to be something at play here – either the power budget of Atom has ballooned, or the die area. With Intel explicitly out of the gate stating that their focusing on performance, a cynic is going to suggested that something else has paid that price, and to that end Intel wasn’t prepared to talk about power windows or die area, though they did point to the already announced Lakefield CPU, which has a 1 x Core + 4 x Tremont design and gets compared to 7 W CPUs.
Comparing 14nm Goldmont Plus (that’s standard 14nm, not 14+ or 14++) to a 10+ Tremont core is going to be difficult: the Tremont core has more in it to drive that performance, however what is not known is how much space was saved moving from 14nm to 10+ and if the extra parts make the core bigger or smaller overall. Needless to say, Tremont has more in it to drive that performance, which we’ll cover in the next few pages.
Post Your CommentPlease log in or sign up to comment.
View All Comments
azazel1024 - Tuesday, October 29, 2019 - linkSure in some cases, but most not super cheap Atom implementations from even the Cherry Trail era weren't all on the USB2 bus, at least not the eMMC. Most typical performance I saw was >100MB/sec reads and 30 or so MB/sec writes on slower implementations. Some of the better eMMC implementations were hitting ~180MB/sec reads and 70MB/sec writes and 6-7k IOPS.
Not SSD performance, but storage performance isn't the issue with HEVC playback. HEVC support is. My Cherry Trail doesn't support H265 decode. I can play back a 1080p HEVC file, but the processor is running between 70-90% utilized when doing it. For an H264 encoded 1080p file it typically runs about 15% utilization to do it.
It can't handle 4k decode.
My biggest issue has been networking performance on the one that I have. Some are better setup, but not all of them. My first generation Cherry was an Asus T100. Max storage performance was 110MB/sec reads, 37MB/sec writes, 5k IOPS. The microSD card slot maxed out at 20MB/sec read and writes. The Wireless was 1:1 802.11n and maxed at about 10MB/sec down and 8MB/sec up (obviously not concurrently) and it was only 40MHz on 5GHz, not 2.4GHz (20MHz only on that).
My current one is a T100ha after my T100 died. Some improvements, some backslides. The read/write speed is up to 170MB/sec and 48MB/sec with 7k max IOPS. The microSD card reader can hit about 80MB/sec reads and 30MB/sec writes (in a card reader in my desktop the same microSD card can hit 80MB/sec reads and 50MB/sec writes). The wireless though is WAY slower. It hits 6MB/sec down and 3MB/sec up max. Supposedly it can do the same 40MHz on 5GHz and 20MHz on 2.4GHz, but I don't see anything like real 1:1 40MHz performance on 5GHz (which should be in the ballpark of 10-12MB/sec, 80-100Mbps).
That is honestly my biggest complaint is the wireless on it is just horrendous. I often use an 802.11ac nano dongle in the keyboard dock USB3 port as that easily pushes 20MB/sec up and down. Even simple website loading using it is significantly faster than the embedded wireless. I know it is a cheap tablet/2-in-1, but it is one of those probably springing an extra $1-2 on BOM for a nicer even 802.11n 1:1 solution would have gone a long way. Let alone at the time it was released, 1:1 802.11ac wireless options were pretty widely available.
I am curious if someone like Asus (or someone else, I am NOT tied to them) will use Tremont in any small 2-in-1. Heck, an update to Surface with one might be nice. I do like the smaller form factor of a 10-11" size tablet. I almost always use my 2-in-1 as a laptop, so a hard keyboard dock is reasonably important to me (but a really nice type cover would be fine, I almost always use it on a table, not on my lap), but I do sometimes use it as an actual tablet for reading (movie/TV/YouTube would generally be fine as a laptop as I am rarely just holding my tablet in front of my face to do that. Usually on a table/desk, occasionally sitting on my knees/stomach but docked). I don't need a TON of performance with one. But at the same time, if I want to grab a movie off my server for an overnight trip or something, it is kind of painful to be downloading a 3GB file at 6MB/sec and having to wait the better part of 10 minutes to download the darn thing. It is usually worth my while to go rummage in my desk drawer, grab my USB3 GbE adapter, plug it in to my tablet and in to a spare LAN drop in one of my rooms and quick grab the file at ~50MB/sec or so a second of the micro SD card write speed and be done in maybe 2 minutes of doing all those steps and the download time. Let alone if I want to grab maybe 2 or 3 movie files at 6-10GB.
A nicer screen would of course be real swell too, but honestly 720p on a 10.1" screen isn't horrible. The wireless limitations are my biggest headache. A bit more CPU and GPU performance would also be nice. I wouldn't mind being able to handle slightly newer/more advanced games on it, but frankly it isn't my gaming machine nor do I need it to be. Portable is more important to me that powerful. But some of the basic tasks it needs to be better at/feeling its age.
Wireless being at least 2x better, and it would be nicer to be more like 3-4x better (which 802.11ac 1:1, if you don't mess up the implementation IS at ~20-25MB/sec). If CPU performance was maybe 15-20% better (and Tremont sounds like it is probably more like 50-100% faster than Cherry trail), GPU maybe twice as fast (also sounds like it would be a lot faster than that), storage performance and peripheral storage is fine as it is on my T100ha, but yeah I sure as heck don't mind some improvements there also. Battery life being better would be nice, but I usually manage >10hrs if I am not doing anything super intensive. I could even live with the current screen, though better coverage of sRGB (I think mine is about 70% sRGB), contrast (actually mine is pretty good at I think around 800:1 or so, not great, but not bad) and higher resolution (900p would be nice, 1080p better).
Maybe someone can do all that in a package less than $400. Oh and 8GB of RAM and 128GB of storage. Max $500 price tag.
eek2121 - Monday, October 28, 2019 - linkeMMC isn't typically known for speed.
Namisecond - Friday, November 1, 2019 - linkMost eMMC isn't optimized for performance. They tend to be optimized for cost.
levizx - Friday, October 25, 2019 - linkYou are confusing iGPU with QSV, they are different IP blocks.
solidsnake1298 - Monday, October 28, 2019 - linkI am not confusing QSV with the iGPU. While QSV is functionally different from the EUs that generate "graphics" and physically occupies a different section of die area from the EUs, QSV is LOGICALLY part of the "iGPU." I'm not sure this is an option in my particular BIOS, but humor me here. If I were to disable the iGPU in my J4205 and use an add-in Nvidia/AMD GPU wouldn't that also mean that QSV is no longer available? On the full power desktop side, if I bought a KF SKU Intel processor (the ones without an iGPU), doesn't that mean that QSV is not available?
Yes, I was referring to QSV specifically. But QSV is a feature of Intel's iGPUs. Just like NVENC is a feature of most of Nvidia's GPUs.
abufrejoval - Tuesday, November 5, 2019 - linkIf you disabled the iGPU, the VPU is gone, too. But you don't need to disable the iGPU when you add a dGPU: Just connect your monitor to the dGPU and leave the iGPU in idle.
Not sure it's worth it, though. I can't see that the Intel VPUs are any better than the ones from Nvidia or AMD, neither in speed nor in quality. And for encoding quality/density CPU still seems best, if most expensive in terms of energy.
solidsnake1298 - Tuesday, November 5, 2019 - linkThe point of my post was to point out that I was not "confusing" QSV with the iGPU when they are logically part of the same block on the die. You can't have QSV (Quick Sync Video) without the iGPU being active. So when, in the context of video decoding, I refer to "iGPU" I am obviously talking about the QSV block on the iGPU.
Namisecond - Friday, November 1, 2019 - link4K output was completely dependent upon the vendor to implement. I have a Gemini Lake laptop that used an HDMI 1.3 or 1.4 output chip. I love it for it's all-day long battery and don't miss the 4K output at all.
hyno111 - Thursday, October 24, 2019 - linkAtom performance actually improved a lot every generation. I would perfer Goldmont Plus based Pentium than the low power dual core Skylake++ without turbo.
Samus - Thursday, October 24, 2019 - linkThat's not true. Atom at various stages has actually taken a step BACKWARDS in performance.
Most obviously, Cedarview was around 20% slower per product SKU than Pineview, thought performance per watt remained nearly identical. Still, the D525 remained the top performaning Atom for years until Avoton launched in 2013.
Atom was also plagued with x64 compatibility issues until Avoton officially supported the x86 extension, along with virtualization, mostly because Avoton was designed specifically as a "Server" product, finding its way in everything from NAS to SMB microservers where it performed terribly compared to even rudimentary RISC CPU's.
It's an absolute marketting failure by Intel to continue pushing the cute name Atom with the reputation they have built for it. They were moving away for awhile, branding many traditional Atom-architecture products Pentium J\Celeron J, then going back on that move to shift Pentium\Celeron back to the Core microarchitecture, and further mutilating the process by actually calling Core-based CPU's Atom's with the x3/x5/x7.
No wonder AMD has maintained consistent OEM support. At least their CPU product stack has made sense for the last 10 years...