Hot Chips 2021 Live Blog: CPUs (Alder Lake, Zen3, IBM Z, Sapphire Rapids)
by Dr. Ian Cutress on August 23, 2021 10:30 AM EST- Posted in
- CPUs
- AMD
- Intel
- Xeon
- Trade Shows
- SoCs
- IBM
- DDR5
- Zen 3
- IBM Z
- Sapphire Rapids
- Alder Lake
- Hot Chips 33
- V-Cache
11:40AM EDT - Welcome to Hot Chips! This is the annual conference all about the latest, greatest, and upcoming big silicon that gets us all excited. Stay tuned during Monday and Tuesday for our regular AnandTech Live Blogs. Today we start at 8:45am PT, so set your watches and notifications to return back here! The first set of talks is all about CPUs: Intel Alder Lake, AMD Zen 3, IBM Z, and Intel Sapphire Rapids.
11:45AM EDT - The stream should be starting momentarily
11:45AM EDT - It usually starts with 15 minutes of pre-show info to begin
11:47AM EDT - Here we go
11:48AM EDT - Apparently some attendees are having issues with too many from the same company on the same VPN
11:49AM EDT - There's a slack channel for all attendees
11:49AM EDT - Behind the scenes
11:51AM EDT - Lots of members on the committees
11:51AM EDT - Selecting the best talks
11:51AM EDT - These people identify keynote speakers, solicit papers for talks
11:52AM EDT - Tutorials were yesterday
11:54AM EDT - Three keynotes
11:55AM EDT - Synopsys is on AI in EDA
11:55AM EDT - Skydio on autonomous flight
11:55AM EDT - DoE on AI Chips and challenges
11:56AM EDT - 'Chips enabling Chips'
11:57AM EDT - For those attending
11:57AM EDT - Posters as part of the conference as well
11:58AM EDT - First session is CPUs, about to start
11:59AM EDT - 'State of the art CPUs'
11:59AM EDT - Efi Rotem for Intel on Alder Lake
12:00PM EDT - The why and how of Alder Lake
12:00PM EDT - Most apps are Single or lightly MT
12:01PM EDT - Increase in support of ML
12:01PM EDT - Working on smarter structures and new instructions for ML
12:01PM EDT - Duplicating multicore
12:02PM EDT - Moores Law and Dennard Scaling
12:02PM EDT - Same arch, different uArch, different opimization point
12:03PM EDT - This is what we saw in the Alder Lake part of the Architecture Day
12:03PM EDT - P-Core and E-Core
12:04PM EDT - E-core has shared L2
12:04PM EDT - P-core is +50% ST performance over the E-core
12:05PM EDT - Scalable SoC architecture
12:05PM EDT - UP3/UP4 for mobile, Desktop
12:05PM EDT - 2+8, 6+8 and 8+8 for P-core + E-core
12:06PM EDT - modular design
12:06PM EDT - mix and match for future products
12:06PM EDT - 96 EUs on mobile, 32 EUs on desktop
12:07PM EDT - Only mobile will get native Thunderbolt
12:08PM EDT - Smartness is built into the hardware
12:08PM EDT - Thread Director is mostly for Window 11
12:09PM EDT - Onboard microcontroller
12:10PM EDT - Thread Director will predict the class of workload and bucket it the classes for the OS scheduler on the oder of 30 microseconds
12:10PM EDT - Core-to-Core IPC is the main metric
12:11PM EDT - Intel EHFI
12:12PM EDT - This is more detail about Thread Director
12:12PM EDT - So every processor gets a section in the table, and it has a value for Perf and Efficiency, and workload is compared
12:14PM EDT - Sometimes it makes sense to coalesce a software thread to fewer cores, or one type of core
12:14PM EDT - Thread Director Table updated less often than thread classification
12:14PM EDT - OS has idea of priority of thread
12:15PM EDT - OS scheduler is final arbiter
12:15PM EDT - Table is topology agnostic
12:17PM EDT - Here's a scheduling example
12:18PM EDT - Helps with asymmetry between the threads
12:19PM EDT - All AI workloads go to P-Cores over anything else
12:19PM EDT - AVX + VNNI / INT8 get highest priority over anything
12:21PM EDT - EPP - Energy Performance Preference also takes a role in input to the scheduler
12:21PM EDT - For power constrained systems
12:21PM EDT - higher priority gets higher voltage and frequency regardless of P-core and E-core
12:22PM EDT - optimal P/V point is a function of phyiscal properties (thermal, binning)
12:22PM EDT - Q&A time
12:23PM EDT - Q: Security of side channel attacks with Thread Director A: No security effect, only performance
12:24PM EDT - Q; Die photo, PCIe - how many PCIe 5/4/3 lanes? A: As shown, slide 11, 16x PCIe 5, 4x lanes of PCIe 4, Desktop has PCH
12:25PM EDT - Q: TDT for Linux, when? A: First enabling was Windows 11, work with Linux for time - it is coming, which version and build will be published later
12:25PM EDT - Now AMD Zen 3 talk
12:26PM EDT - Mark Evers from AMD
12:26PM EDT - The Zen Journey from 2017
12:27PM EDT - New era in the market for AMD
12:27PM EDT - Zen3 says AMD 3D Cache support
12:27PM EDT - Exceeding Industry Trends
12:28PM EDT - Scale-out for servers and supercomptuers
12:28PM EDT - Socket compatibility for past products
12:29PM EDT - 4k op cache
12:30PM EDT - +19% IPC gains, which we verified at launch
12:31PM EDT - Large chunk of performance gain from the front end fetch/decode
12:31PM EDT - reduced bubble cycle latency
12:32PM EDT - supporting wider execution
12:32PM EDT - lower latencies for some instructions
12:33PM EDT - 10 issue per cycle up from 7
12:33PM EDT - More execution bandwidth ILP extraction
12:33PM EDT - Disaggregated the ALUs rather than just add more
12:34PM EDT - Without any additional increase in register file ports
12:34PM EDT - larger 6-wide FP unit
12:34PM EDT - Faster 4-cycle FMAC
12:34PM EDT - Reduced FMA latency
12:34PM EDT - Doubled INT8 throughput
12:35PM EDT - larger load-store
12:35PM EDT - L2 DTLB has 6 page walkers
12:36PM EDT - Changes from Zen2
12:36PM EDT - Removed bubble cycle with branch prediction
12:37PM EDT - Back on track faster when mispredict
12:37PM EDT - Quicker switching with I-cache overflow
12:38PM EDT - How AMD calculated IPC uplift
12:39PM EDT - Enterprise security additions
12:39PM EDT - SEV, SEV-ES, SEV-SNP
12:39PM EDT - SNP is the new feature for Zen 3
12:40PM EDT - Eliminates page table attack vectors through VMs/hypervisors
12:40PM EDT - No application modification needed
12:40PM EDT - New instruction support
12:41PM EDT - Double L3 cache
12:42PM EDT - access from cores, better for gaming
12:42PM EDT - reduction in effective L3 memory latency
12:42PM EDT - 2x32B data channels in opposite directions
12:43PM EDT - L3 is an non-inclusive cache
12:43PM EDT - L2 tags in L3
12:43PM EDT - support 192 misses from L3 to memory
12:44PM EDT - Built in support for AMD V-Cache
12:44PM EDT - Already demoed +64 MB L3
12:44PM EDT - +15% faster on gaming
12:47PM EDT - Ryzen performance gains in the same TSMC 7nm
12:47PM EDT - All from uarch and physical design
12:47PM EDT - Gaming was a main target for Zen 3
12:48PM EDT - Performance that matters for the user
12:49PM EDT - Summing up
12:49PM EDT - Zen4 by end of 2022
12:49PM EDT - On track in TSMC N5
12:50PM EDT - Time for Q&A
12:51PM EDT - Q: V-Cache is applicable all the segments, all just for desktop/server A: Lot of different workloads, benefit from v-cache, Havenlt announced specific products with v-cache, but some workloads across segments that benefit
12:52PM EDT - Q: Primary motication for tripling table walkers A: some workloads with large DRAM access footprint with outstandling TLB misses. Lots of workloads won't need more than 2, but benefits a few pages, but a clever way to add more without excessive
12:54PM EDT - Q: Is the chiplet technology technology scalable? A: When it comes to the 3D Vcache - latency is not large. For chiplets of having CCDs and IODs, it can give you more flexibility than monolithic. Build best products with chiplets
12:55PM EDT - Next presentation is from IBM
12:55PM EDT - IBM Telum Processor
12:55PM EDT - Optimized from AI
12:55PM EDT - Next Gen Z processor
12:55PM EDT - This would be IBM z16, but this one gets a name
12:56PM EDT - How often do you use a mainframe? Probably yes if you've used your credit card
12:57PM EDT - Chief Architect
12:57PM EDT - Starting with background on IBM z
12:57PM EDT - large part of IT infrastructure
12:58PM EDT - Even startups with NFTs
12:58PM EDT - Insights with AI models are needed
12:58PM EDT - Telum is based for this
12:59PM EDT - Enterprise workloads sensitive to ST performance and scalability
12:59PM EDT - Lots of workloads are heterogeneous
01:00PM EDT - Need embedded accelerators
01:00PM EDT - New AI accelerator
01:00PM EDT - New cache hierarchy and fabric design
01:00PM EDT - Encrypted memory and trusted execution environment
01:00PM EDT - Enclaves
01:01PM EDT - Reilability and Availability, 7x9 on z15
01:02PM EDT - 8 cores + 4 MB L2
01:02PM EDT - 5 GHz+ with SMT2
01:02PM EDT - New branch prediction
01:02PM EDT - 270000 branch target table entries
01:02PM EDT - Private 32 MB of L2 cache
01:03PM EDT - 19-cycle load-use latency
01:03PM EDT - Moved away for shared L3 and off-chip L4
01:04PM EDT - 4 pipelines on L2 allowing overlapped traffic
01:04PM EDT - L3 and L4 are now virtual
01:05PM EDT - 320 GB/s ring bandwidth
01:06PM EDT - 8 GB L4 cache
01:07PM EDT - 2-cycle transfer path between chips
01:07PM EDT - 2:1 sync clock grids
01:08PM EDT - 8-chip has flat topology - direct connect to all 8 chips
01:08PM EDT - 40% socket performance over z15
01:08PM EDT - Some of this comes from the AI workload
01:09PM EDT - AI algorithms make machines more efficient
01:09PM EDT - Using the AI to increase security
01:10PM EDT - very low inference latency - every core has access
01:11PM EDT - Kinda like the Centaur CNS core
01:12PM EDT - 6 TFLOPs per chip per AI accelerator
01:13PM EDT - Accelerator is extensible with firmware releases as AI evolves
01:13PM EDT - New instructions for AI accelerator
01:13PM EDT - simpler instructions
01:13PM EDT - But you have to use the libraries to use the instructions
01:14PM EDT - supports virtualization and memory translation
01:14PM EDT - Manages all the data with the new instructions
01:14PM EDT - 6 TF per chip, 200 TF in 32-chip system
01:15PM EDT - 8-way SIMD engine, 128 tiles, MAC array on MatMul and convolution. 32 tioles for activation
01:15PM EDT - focused on FP16
01:16PM EDT - 100 GB/s bandwidth to the AI accelerator
01:16PM EDT - 100 per core, 600 total
01:16PM EDT - Software is all through ONNX
01:17PM EDT - TensorFlow or through IBM Deep Learning Compiler through ONNX
01:18PM EDT - Client proxy model performance
01:19PM EDT - Samsung 7nm, 530 sq mm, 22.5B transistors
01:19PM EDT - 5 GHz+ base clock frequency
01:20PM EDT - Q&A time
01:21PM EDT - Q: Use ring for AI accelerator? A: yes
01:21PM EDT - Firmware does additional management through dedicated buses
01:22PM EDT - Q: Packaing technology for dual die A: Standard, no bridges, put them close together, less than 0.5mm, some intresting thermal and mechanical, but signalling is through the package. Cool innovation on signalling due to clock synchronization.
01:23PM EDT - Q: BW of Inter-socket and Intra-drawer links A: 320 GB/s between chips, draw in each direction is 45 GB/s
01:25PM EDT - Q: Memory ordering preserved between cores and accelerators - A: magic! keep track of data, on a cache miss, broadcasts, memory state bits tracked to broadcast further out, even go across whole system, when data arrives, have to make sure it can be used, invalidate all other copies confirmed before working on data
01:29PM EDT - Q: How does Telum maintain linear scaling A: lots of scaling work on generic workloads, fabric design etc, optimize latency between chips and across drawers, that's standard. Investment in latency and bandwidth. AI chart shows almost perfect linear, because those are parallel tasks, and data is local
01:30PM EDT - Now for Sapphire Rapids from Intel
01:31PM EDT - Laucnhing 1H 22
01:32PM EDT - Xeon is optimized for performance and CD Perf
01:33PM EDT - Still calling the cores P-cores even though there's no E-cores
01:33PM EDT - Modular SoC architecture
01:34PM EDT - CXL 1.1
01:34PM EDT - Virtualization and VM telemetry
01:34PM EDT - Low Jitter Architecture
01:34PM EDT - Next-Gen QoS
01:35PM EDT - THIS IS SAPPHIRE RAPIDS
01:35PM EDT - Here's the die shot
01:36PM EDT - We've been told there is two types of tile on SPR
01:36PM EDT - Every thread has full access to all resources on all tiles
01:37PM EDT - NUME Clustering
01:37PM EDT - NUMA*
01:37PM EDT - UPI 2.0
01:38PM EDT - One of the issues here is that each tile has two memory channels - we've been told each SPR core will have 8 channels, that means each SPR product will have to have 4 tiles
01:39PM EDT - New instructions
01:39PM EDT - AMX for Matrix
01:39PM EDT - AIA instructions for Acceelrators
01:39PM EDT - HFNI for FP16 half precision
01:39PM EDT - CLDEMOTE
01:40PM EDT - Accelerator Engine improvements
01:41PM EDT - Avoid kernel mode overheads with AIA
01:41PM EDT - Providing base functions for deployment of acceleration engines
01:42PM EDT - DSA and QAT
01:42PM EDT - Doubled QAT
01:42PM EDT - Still requires a chipset
01:43PM EDT - ZLIB L9 98% offload
01:44PM EDT - Dynamic Load Balancer
01:44PM EDT - 400M load balancing decisions per second
01:44PM EDT - important for QoS
01:44PM EDT - ideal for packet processing and microservices
01:45PM EDT - 4 x24 UPI links at 16 GT/s (four PCIe 4.0 x16 links for multisocket)
01:46PM EDT - >100 MB LLC
01:46PM EDT - 8 memory channels
01:47PM EDT - Optane 300-series support
01:47PM EDT - SPR+HBM
01:47PM EDT - connected over EMIB
01:47PM EDT - Flat mode and caching mode with DRAM
01:47PM EDT - Can also support optane
01:48PM EDT - INT8 improved through a new accelerator
01:49PM EDT - Industry standard frameworks for CPU based training and inference
01:49PM EDT - Large focus on microservices from initial design
01:50PM EDT - AIA to help service startup time
01:51PM EDT - Scalability with a monolithic view
01:51PM EDT - 10 lots of EMIB
01:51PM EDT - Q&A time
01:52PM EDT - Q: HBM Cache mode have a map? Where do you keep the tags if the HBM is a cache? A: No details quite yet
01:53PM EDT - Q: How is the AI perf of AMX compared to A100? A: No comparison yet
01:54PM EDT - Q: Intel CPU support DDIO, if HBM is cache, where does Data go first? A: Go to L3.
01:55PM EDT - Q: CXL - IBM did CAPI. Can you compare CXL to CAPI? A: Intent of CXL is similar to CAPI. CXL has IO similar to PCIe, but also can consider Accelerators with their own caches
01:55PM EDT - Intel will support CXL.mem in future products, not SPR
01:56PM EDT - Q: Interdie crossing latency A: low single digit nanosecond, little different between vertical and horizontal due to rectangular design
01:57PM EDT - DSA and QAT look like PCIe devices, require drivers (not bare metal), but they are part of the AIA framework. Works with AIA instructions, work with virtualization, but they look like PCIe devices
02:00PM EDT - .
39 Comments
View All Comments
arashi - Monday, August 23, 2021 - link
They can't even power it on/run workloads on it, every single vague chart/graph is simulated.eastcoast_pete - Monday, August 23, 2021 - link
Thanks Ian! Question about IBM's Telum CPU for mainframe being fabbed at Samsung: Is Samsung considered a "Trusted foundry"? If not, quite a number of US government agencies cannot use (buy or lease ) a mainframe with a Telum inside.On a different subject: How many people from Apple attend this conference? Reason I ask is that Apple least in the past, basically behaved like a parasite, as they never present anything at this and similar meetings. They typically take a lot of notes and ask questions, but it's all take, and no giving of information. If I am mistaken about Apple presenting, please correct me; would be nice to know they actually show signs of good corporate citizenship.
name99 - Tuesday, August 24, 2021 - link
(a) The Samsung question is very interesting! I'd be curious as to how that plays out.(b) At least when I was at Apple (before Apple got into the CPU design business), plenty of Apple people attended. Your outrage is more based on ignorance than reality.
- Apple explain plenty of how their designs work if you make the effort to spelunk through the patents and run some experiments.
- BUT their design is what you would get if you started with a clean slate in say 2005, with strong opinions (that have been validated) as the how frequency vs power vs density will play out over the next few generations of process. Their design will not help anyone who's unwilling to burn their existing design and start from scratch.
- There is very little in their design that I had not previously encountered somewhere in the academic literature. They benefited massively from ZERO NIH concerns. You may examining the literature is obvious. It's not. So many good ideas were published 20 years ago (plenty of them sponsored by Intel) but Intel's management, in their wisdom, have not been interested in restructuring their designs to the extent necessary to exploit those ideas.
- Which gets us to the final point. You'd be stunned, when you look at the details, at how much Apple changes (ie is willing to change) every design. Their have been three big generations, the first one being internal PA Semi stuff ending at the A6; then A7..A10; then A11..A14. My guess is A15 begins a new generation.
Each generation is a huge visible change (eg A7 added 64b; A11 added clustering and everything that flows from that, and removed 32b). But it's also a massive design change. Apart from that, the annual changes are frequently, and silently, much larger than the sorts of things we see in these HotChips talks.
You have to have a team [and management!] that are willing to make these massive annual changes, plus a set of tools to validate the changes are worth doing, plus a set of tools to help implement the changes.
I used to think somewhat like you. Not any more. Apple didn't get to their position by some sort of nefarious tricks whereby they "stole" ideas in some way that prevented their use by others, and they aren't keeping their tricks secret. They got to where they are by
- very deep knowledge of the literature
- an imagination to combine ideas from many many places
- a willingness to take risks in the sense of constant redesign.
One way in which the best parts of Apple work well is that design and UI is separate from implementation (as individuals, not as collaborators). This has a VERY important (and under-appreciated) effect: the designers design for what would be great UI, and what the HW is capable of, but they don't have to do the work!
This is SO important. When the engineer is the designer, you always consider an idea in terms of "oh god, that sounds like so much hard work, so many changes". The Apple split means you rarely suffer from that failure mode: rather than engineers dismissing their own ideas bcs a few minutes thought suggests it's a lot of work, they are constantly being forced to implement good ideas -- and often discovering those good ideas can be implemented without nearly as much work as they imagined, or as part of a grand redesign that's worth doing because of so much more it opens up.
My GUESS is that Apple's CPU design works in much the same way, that there are a few lofty theorists, extremely familiar with the academic literature, who are constantly revisiting previous ideas and simulations and asking "why don't we change the register allocator in this way? the current scheme for sharing registers is OK, but look at this new scheme I thought up; etc etc"
The next level down of engineers probably groan and push back against every one of these ideas, but the important point is that in Apple all the weight is on the side of the grand designers, none on the side of the poor engineers who have to do the implementation.
In a way this is just the latest version of a computing argument as old as time. When do you stick with the existing, tried and true code base/design; and when do you engage in huge changes? Since the mid-70s Apple has been defined by being willing to engage in the huge changes, and pay the price of constant low-level irritation every year (every year many things are fixed but a few other things break). Since the same time both halves of Wintel have been defined by not engaging in large changes, by engaging mainly in minimal changes. For a few years Intel engaged in aggressive internal design changes even as the ISA was not changed much (think of 386 to 486 to Pentium to PPro) but not much since about Nehalem.
Meanwhile the classic MS mentality has been expressed by Joel Spolsky in many ways, not least here: https://www.joelonsoftware.com/2001/10/14/in-defen... and here https://www.joelonsoftware.com/2000/04/06/things-y...
I'm not interested in arguing about the extent to which Spolsky (or MS or Intel) or justified in their behavior. My point is the poster's original claim, that Apple is not sharing; and my claim that the issue is not that Apple is keeping secrets, it's that all the other companies find it (every year) easier to just evolve the existing design a little more in a few directions than to tear it all down and start again. Apple publishing a hundred papers would not change that...
It's interesting to compare this with semiconductor processes. Aren't I a hypocrite for complaining that Intel are too timid in redesigning their micro-architecture while also complaining that they should follow TSMC in how they design their process?
I think the differences is Intel's process failures (IMHO) result not so much from big leaps as from marketing/finance driven decisions.
The difference between INTC and TSMC that matters is that TSMC STFU until it has something validated along every scary dimension. If something CANNOT be validated yet, it is postponed (cf, eg, GAA on N3). Intel, on the other hand (for reasons that make zero sense to me) insists on claiming, well before the scheme is validated at a manufacturing level, that it will deliver technology X on date Y. Then they find themselves locked into that promise even when it makes no sense.
Would TSMC's cautious half-nodes help Intel? Well, not if Intel insisted on still describing every half node step they plan for the next ten years. (Wait, isn't that what they already DID with Intel 7, Intel 5, Intel 4, ...?) The issue is not that TSMC is making cautious half steps while Intel is rebuilding the process from scratch each generation; it's that TSMC is using, for each process, a suite of technologies that have all been validated, separately and together in the lab; while Intel is using, for each process, a suite of suite of technologies that have all been validated, separately and together, on a marketing slide five years ago.
eastcoast_pete - Tuesday, August 24, 2021 - link
While I actually agree with you on a number of your points, my criticism of Apple was not that they don't disclose at least some of their hardware designs somewhere (patents actually require that, after they have been granted). Rather, it was and is about them never (AFAIK) presenting at Hot Chips; they absolutely attended, at least in the past. . One of the attractions of such meetings is that attendees can ask presenters questions; and, often enough, "I can't talk about that" is also an important answer.name99 - Tuesday, August 24, 2021 - link
This is not a technical point, so take it as you wish, but I would urge you to look inward as to why you ACTUALLY care about how Apple behaves in this respect.We've agreed that there is nothing going on that deserves the term parasite, no "unfair" witholding of information by Apple, no insights that couldn't be acted upon by others.
And if you don't have the energy to work through patents and run your own tests to know how these CPUs work, well in the past people like Agner or Henry Wong provided the real, serious info at a much deeper level than HotChips talks, and for M1 people like Andrei, Dougall, and I have been doing the same thing, with deep dives published in various places.
So look into yourself, look past the tribalism and mindlessness, and ask what you're REALLY upset about.
My guess is that you want to bring the future forward; you want to experience that thrill of knowing what's new in the A15 now, not when Apple has their iPhone event in three weeks. You want to know today, not some time next year, how Apple will solve the issue of scaling up M1-sized concepts to the requirements of a Mac Pro.
And that's perfectly human, we all want to know the future. But you have to realize that, in this particular sort of case, it's something like an addiction. You'll get a one-time thrill of knowing 2022's design in 2021. But then what? Now, in 2022 you will want 2023's design. After that one hit, you're still limited to only learning one year's worth of new design every year. Neither your epistemic situation, nor your level of joy, have actually improved. And if your chosen company, like Intel, submits to this addiction, things go south really fast, with Intel, every year, trying to provide more than one year's worth of future prophecy beyond what they did last year, till they're demoing utterly meaningless ten year projections. This all ends like any addiction ends.
If you want to get a constant thrill of what might be coming, don't demand it from a company that has to produce real products; that simply cannot end well either for you (with a drastically telescoped future) or the company (locked into a roadmap that may make ever less sense). Instead read the stuff that *might* happen, but doesn't lock down the future -- read the academic literature, read what IMEC is doing.
Oxford Guy - Tuesday, August 24, 2021 - link
Apple’s business model has been about speeding up planned obsolescence since the Apple III.(Demoing the Mac using a superior not-for-sale prototype surreptitiously is just one symptom of that.)
TristanSDX - Monday, August 23, 2021 - link
Great dissapoitment. For ADL, on such conference I expected great detail of core design, instead there were replay of pretty shallow marketing info, and explanation of ThreadDirector. Crap and waste of time.abufrejoval - Wednesday, August 25, 2021 - link
Some things don't seem to change, ever, like the z/Arch chips: Tons of really good ideas, but useless, because they stay hell bent on selling to a very affluent niche.They've stuck with their mainframe snake oil since water cooled ECL, even when they went CMOS under the cover and yet for most credit card companies I know, their addiction was never really about the hardware, but the software stack. That software stack could run on the very same power chips that runs the i-Series (or ARM for that matter) quite quickly and reliably enough for pretty much everyone.
You won't get these chips manufactured cheaply, but there is no technical hurdle to doing a lesser "E-Core" variant of z-Arch. And had they done so years ago, AMD64 might have never happened.
And on that front: I'd have never thought I'd see a AMD HotChips presentation *that* boring. I think there wasn't a single bit of news in all that and they got caught in a very awkward moment of their product roadmap. (And I don't forgive them, that they made all those VM encryption options "server only": That is a move so stupid, I want to fire someone)
It made all the trumpeting from Intel almost look impressive: Somebody sure thinks that there are major doubts on Intel's attractiveness in corporate/cloud decision makers mainds. They sure fire from all cannons, but it still sounds like stage thunder.
SystemsBuilder - Thursday, August 26, 2021 - link
My take on it, as someone who attended Hot Chips and this session live:I was hugely disappointed by Intel's presentations. completely marketing department controlled - pretty much a rerun of intel's architecture days. I would even say that the way the intel presenters were speaking (monotone unengaging tone and very controlled sentences) it was 100% scripted and they never even one went of script or expanded outside what has already been release at the architecture days. not even in the Hot Chips Slack chat channel - 100% marketing messaging controlled. I feel bad for the Intel engineers being on super tight leach from they masters at the Marketing department.
AMD did a much better job and it was a quite exciting presentation that actually released new exciting information.
my conclusion of this session together with the Packaging session was that TWSC is at least 2-4 years a head of intel in packaging technology and that means AMD will continue to be 2-4 years ahead for the foreseeable future in terms of core scaling and performance... I remember Intel presenter said something defensive since he was presenting directly after the TSMC presenter like: We are focusing on packaging technology "at scale" clearly feeling the need to differentiate with that towards TSMC since his presentation was 2-3 years behind TSMC in pure tech terms - in my view.