A Wider Back End

Moving beyond the micro-op queue, Tremont has an 8 execution ports, filled from 7 reservation stations.

The only two ports using a combined reservation station are the address generator units (AGUs) - this is in stark contrast to the Core design, which in Sunny Cove uses a unified reservation for all integer and floating point calculations and three for the AGUs. The reason that Tremont uses a unified reservation station for the two AGUs, also backed by extra memory for queued micro-ops, is in order to supply both AGUs with either 2x 16-byte stores, 2x 16-byte loads, or one of each. Intel clearly expects the AGUs on Tremont to be fairly active compared to other execution ports.

On the integer side, aside from the two AGUs, Tremont has 3 ALUs, a jump port, and a store data port. Each ALU supports different functions, with one enabling shift functions and another for multiplication and division. Compared to core, these ALUs are extremely lightweight, and Intel hasn’t gone into specifics here.

On the floating point side, we are a little bit more varied – the three ports are split between two ALUs and a store port. The two ALUs have one focused on fused additions (FADD), while the other focuses on fused multiplication and division (FMUL). Both ALUs support 128-bit SIMD and 128-bit AES instructions with a 4-cycle latency, as well as single instruction SHA256 at 4-cycles. There is no 256-bit vector support here. In order to help with certain calculations, GFNI instruction support is included.

There is also a larger 1024-entry L2 TLB, supporting 1024x 4K entries, 32x 2M entries, or 8x 1G entries. This is an upgrade from the 512-entry L2 TLB in Goldmont.

New Instructions

As with any generation, Intel adds new supported instructions to either accelerate common calculations that would traditionally require lots of instructions or to add new functionality. Tremont is no different.

TITLE
AnandTech Tremont Goldmont
Plus
Goldmont Airmont Silvermont
Process 10+ 14 14 14 22
Release Year 2019 2017 2016 2015 2013
New Instructions CLWB
GFNI
ENCLV
CLDEMOTE
MOVDIR*
TPAUSE
UMONITOR
UWAIT
SGX1
UMIP
PTWRITE
RDPID
RDSEED
SMAP
MPX
XSAVEC
XSAVES
CLFLUSHOPT
SHA
  SSE4.1
SSE4.2
MOVBE
CRC32
POPCNT
CLMUL
AES
RDRAND
PREFETCHW

(When asked what other new instructions are supported, Intel stated to look at the published documents about future instructions. When it was pointed out that those documents weren’t exactly clear and that in the past Intel hasn’t spoken about future designs, we were not afforded additional comments.)

When we get hold of a Tremont device, we’ll do a full instruction breakdown.

Tremont: A Wider Front End and Caches Beyond The Core, Conclusions
Comments Locked

101 Comments

View All Comments

  • azazel1024 - Tuesday, October 29, 2019 - link

    Sure in some cases, but most not super cheap Atom implementations from even the Cherry Trail era weren't all on the USB2 bus, at least not the eMMC. Most typical performance I saw was >100MB/sec reads and 30 or so MB/sec writes on slower implementations. Some of the better eMMC implementations were hitting ~180MB/sec reads and 70MB/sec writes and 6-7k IOPS.

    Not SSD performance, but storage performance isn't the issue with HEVC playback. HEVC support is. My Cherry Trail doesn't support H265 decode. I can play back a 1080p HEVC file, but the processor is running between 70-90% utilized when doing it. For an H264 encoded 1080p file it typically runs about 15% utilization to do it.

    It can't handle 4k decode.

    My biggest issue has been networking performance on the one that I have. Some are better setup, but not all of them. My first generation Cherry was an Asus T100. Max storage performance was 110MB/sec reads, 37MB/sec writes, 5k IOPS. The microSD card slot maxed out at 20MB/sec read and writes. The Wireless was 1:1 802.11n and maxed at about 10MB/sec down and 8MB/sec up (obviously not concurrently) and it was only 40MHz on 5GHz, not 2.4GHz (20MHz only on that).

    My current one is a T100ha after my T100 died. Some improvements, some backslides. The read/write speed is up to 170MB/sec and 48MB/sec with 7k max IOPS. The microSD card reader can hit about 80MB/sec reads and 30MB/sec writes (in a card reader in my desktop the same microSD card can hit 80MB/sec reads and 50MB/sec writes). The wireless though is WAY slower. It hits 6MB/sec down and 3MB/sec up max. Supposedly it can do the same 40MHz on 5GHz and 20MHz on 2.4GHz, but I don't see anything like real 1:1 40MHz performance on 5GHz (which should be in the ballpark of 10-12MB/sec, 80-100Mbps).

    That is honestly my biggest complaint is the wireless on it is just horrendous. I often use an 802.11ac nano dongle in the keyboard dock USB3 port as that easily pushes 20MB/sec up and down. Even simple website loading using it is significantly faster than the embedded wireless. I know it is a cheap tablet/2-in-1, but it is one of those probably springing an extra $1-2 on BOM for a nicer even 802.11n 1:1 solution would have gone a long way. Let alone at the time it was released, 1:1 802.11ac wireless options were pretty widely available.

    I am curious if someone like Asus (or someone else, I am NOT tied to them) will use Tremont in any small 2-in-1. Heck, an update to Surface with one might be nice. I do like the smaller form factor of a 10-11" size tablet. I almost always use my 2-in-1 as a laptop, so a hard keyboard dock is reasonably important to me (but a really nice type cover would be fine, I almost always use it on a table, not on my lap), but I do sometimes use it as an actual tablet for reading (movie/TV/YouTube would generally be fine as a laptop as I am rarely just holding my tablet in front of my face to do that. Usually on a table/desk, occasionally sitting on my knees/stomach but docked). I don't need a TON of performance with one. But at the same time, if I want to grab a movie off my server for an overnight trip or something, it is kind of painful to be downloading a 3GB file at 6MB/sec and having to wait the better part of 10 minutes to download the darn thing. It is usually worth my while to go rummage in my desk drawer, grab my USB3 GbE adapter, plug it in to my tablet and in to a spare LAN drop in one of my rooms and quick grab the file at ~50MB/sec or so a second of the micro SD card write speed and be done in maybe 2 minutes of doing all those steps and the download time. Let alone if I want to grab maybe 2 or 3 movie files at 6-10GB.

    A nicer screen would of course be real swell too, but honestly 720p on a 10.1" screen isn't horrible. The wireless limitations are my biggest headache. A bit more CPU and GPU performance would also be nice. I wouldn't mind being able to handle slightly newer/more advanced games on it, but frankly it isn't my gaming machine nor do I need it to be. Portable is more important to me that powerful. But some of the basic tasks it needs to be better at/feeling its age.

    Wireless being at least 2x better, and it would be nicer to be more like 3-4x better (which 802.11ac 1:1, if you don't mess up the implementation IS at ~20-25MB/sec). If CPU performance was maybe 15-20% better (and Tremont sounds like it is probably more like 50-100% faster than Cherry trail), GPU maybe twice as fast (also sounds like it would be a lot faster than that), storage performance and peripheral storage is fine as it is on my T100ha, but yeah I sure as heck don't mind some improvements there also. Battery life being better would be nice, but I usually manage >10hrs if I am not doing anything super intensive. I could even live with the current screen, though better coverage of sRGB (I think mine is about 70% sRGB), contrast (actually mine is pretty good at I think around 800:1 or so, not great, but not bad) and higher resolution (900p would be nice, 1080p better).

    Maybe someone can do all that in a package less than $400. Oh and 8GB of RAM and 128GB of storage. Max $500 price tag.
  • eek2121 - Monday, October 28, 2019 - link

    eMMC isn't typically known for speed.
  • Namisecond - Friday, November 1, 2019 - link

    Most eMMC isn't optimized for performance. They tend to be optimized for cost.
  • levizx - Friday, October 25, 2019 - link

    You are confusing iGPU with QSV, they are different IP blocks.
  • solidsnake1298 - Monday, October 28, 2019 - link

    I am not confusing QSV with the iGPU. While QSV is functionally different from the EUs that generate "graphics" and physically occupies a different section of die area from the EUs, QSV is LOGICALLY part of the "iGPU." I'm not sure this is an option in my particular BIOS, but humor me here. If I were to disable the iGPU in my J4205 and use an add-in Nvidia/AMD GPU wouldn't that also mean that QSV is no longer available? On the full power desktop side, if I bought a KF SKU Intel processor (the ones without an iGPU), doesn't that mean that QSV is not available?

    Yes, I was referring to QSV specifically. But QSV is a feature of Intel's iGPUs. Just like NVENC is a feature of most of Nvidia's GPUs.
  • abufrejoval - Tuesday, November 5, 2019 - link

    If you disabled the iGPU, the VPU is gone, too. But you don't need to disable the iGPU when you add a dGPU: Just connect your monitor to the dGPU and leave the iGPU in idle.

    Not sure it's worth it, though. I can't see that the Intel VPUs are any better than the ones from Nvidia or AMD, neither in speed nor in quality. And for encoding quality/density CPU still seems best, if most expensive in terms of energy.
  • solidsnake1298 - Tuesday, November 5, 2019 - link

    The point of my post was to point out that I was not "confusing" QSV with the iGPU when they are logically part of the same block on the die. You can't have QSV (Quick Sync Video) without the iGPU being active. So when, in the context of video decoding, I refer to "iGPU" I am obviously talking about the QSV block on the iGPU.
  • Namisecond - Friday, November 1, 2019 - link

    4K output was completely dependent upon the vendor to implement. I have a Gemini Lake laptop that used an HDMI 1.3 or 1.4 output chip. I love it for it's all-day long battery and don't miss the 4K output at all.
  • hyno111 - Thursday, October 24, 2019 - link

    Atom performance actually improved a lot every generation. I would perfer Goldmont Plus based Pentium than the low power dual core Skylake++ without turbo.
  • Samus - Thursday, October 24, 2019 - link

    That's not true. Atom at various stages has actually taken a step BACKWARDS in performance.

    Most obviously, Cedarview was around 20% slower per product SKU than Pineview, thought performance per watt remained nearly identical. Still, the D525 remained the top performaning Atom for years until Avoton launched in 2013.

    Atom was also plagued with x64 compatibility issues until Avoton officially supported the x86 extension, along with virtualization, mostly because Avoton was designed specifically as a "Server" product, finding its way in everything from NAS to SMB microservers where it performed terribly compared to even rudimentary RISC CPU's.

    It's an absolute marketting failure by Intel to continue pushing the cute name Atom with the reputation they have built for it. They were moving away for awhile, branding many traditional Atom-architecture products Pentium J\Celeron J, then going back on that move to shift Pentium\Celeron back to the Core microarchitecture, and further mutilating the process by actually calling Core-based CPU's Atom's with the x3/x5/x7.

    No wonder AMD has maintained consistent OEM support. At least their CPU product stack has made sense for the last 10 years...

Log in

Don't have an account? Sign up now