02:04PM EDT - Final talk of this session is IBM z15. This is big iron mainframe stuff. Prepare to be shocked about these chips, and wonder why you don't have them.

02:04PM EDT - Mainframes are still releveant - 220+ billion lines of COBOL still in deployment today. 70% of all business transactions still use COBOL

02:05PM EDT - Programs built in 1964 on IBM mainframes still work today

02:05PM EDT - 70k searches on google per second, vs. 1.3 million transactions per second on mainframes

02:06PM EDT - Deep pipeline high frequency z-series

02:07PM EDT - z13 introduced SMT, z14 introduced pervasive encryption

02:07PM EDT - These CPUs are ground up built by IBM and not used by anyone else

02:07PM EDT - Two bits of silicon - Storage Controller with 960 MB L4 cache, four Complte Chips

02:08PM EDT - 5 drawers are fully connected through the SC chips

02:08PM EDT - Large 14nm SOI designs

02:08PM EDT - 700mm2+ each

02:08PM EDT - ~700mm2

02:08PM EDT - Each SC chip has 12 cores. 8 MB L2 cache

02:08PM EDT - Max config supports 240 cores. 190 cores are customer available, others are for management or recovery

02:09PM EDT - 60 PCIe 4 x16 connections

02:09PM EDT - 40 TB RAIM memory supported

02:09PM EDT - Two CP chips create a single logical cluster

02:10PM EDT - CP chip

02:10PM EDT - 12 cores, 5.2 GHz

02:10PM EDT - 9.1B transistors

02:10PM EDT - 128 KB L1-I cache, 128 KB L1-D cache

02:10PM EDT - L2 4MB cache private

02:10PM EDT - 256 MB L3 cache shared

02:11PM EDT - Secure exectuion - 38 new instructions of vector performance

02:11PM EDT - speed up of common instructions in a smart way

02:11PM EDT - on-chip accelerators such as gzip, elliptic curve crypto, on-core sort/merge

02:11PM EDT - Here are the comparisons to z14

02:11PM EDT - 14% ST perf over z14

02:12PM EDT - Deep pipeline, CISC architecture

02:12PM EDT - branch is async

02:12PM EDT - two copies of almost everything shown

02:12PM EDT - recovering unit for when errors are detected - processor rolls back

02:13PM EDT - This allows for transient recovery from hardware errors

02:13PM EDT - Known good state can be transferred to a new core if non-transient error happens

02:13PM EDT - The goal of these cores is to be recoverable, even when blasted with high-energy proton beams

02:15PM EDT - NXU is syncrhonous and runs in real time

02:16PM EDT - Two main ways for compression - IBM uses both depending on the size to get the best results

02:17PM EDT - Elliptic curve cryptography acceleration unit in each core, along with enhanced modulo unit which it relies on

02:17PM EDT - 'MA unit' has its own instruction set and 'core'

02:17PM EDT - sign and verify is implemented as firmware and hardware

02:17PM EDT - Acts as a blueprint for future accelerators

02:17PM EDT - Attached to back-end of the pipeline

02:18PM EDT - All execution is in order and non-speculative

02:18PM EDT - No pipeline pain points

02:18PM EDT - Results are passed to the core

02:18PM EDT - physically these accelerators could be placed far away from core logic as needed

02:18PM EDT - dozens to hundreds of modulo ops with a couple of doubleword instructions

02:19PM EDT - Core is called millicode ?

02:19PM EDT - Here are the internal instruction set

02:19PM EDT - speed ups vs external attached PCIe accelerator on z14

02:19PM EDT - Secure Execution for z15 with vertical isolation

02:20PM EDT - specialized mode in the CPU, IO, and memory subsystem

02:20PM EDT - ultravisor sits between the hypervisor and OS

02:21PM EDT - integrity hash and input/output counts to stop malicious guests

02:22PM EDT - controlled code environment

02:23PM EDT - 5.2 GHz water cooled

02:23PM EDT - 4 CP and 1 SC chip per drawer

02:23PM EDT - 14% ST and 25% cap vs z14

02:24PM EDT - Q&A time

02:25PM EDT - Q: Pipeline depth? A: It's long! Long front end and back end and extends with the recovery

02:25PM EDT - Around 30

02:25PM EDT - Q: L1 / L2 load to use latency? A: L1 4-cycle loop, 8-cycle for L1 miss/L2 hit

02:26PM EDT - Q: For secure execution, what is the isolation from firmware? A: Validated trusted firmware / ultravisor. That's an integral part of our secure guest security

02:27PM EDT - Q: Async branch prediction? What happens if it's behind i-fetch A: It's lossless, and i-fetch is in-order. After a pipeline restart, if i-fetch is ahead, the pipeline will react and throw away as required. There are syncs - there's a hard sync at dispatch, so no predictions are dropped

02:27PM EDT - Q: Core IPC vs Power10? A: ask power!

02:28PM EDT - AES-256 for page encryption, integrity hash is SHA-512

02:29PM EDT - Q: 5.2 GHz? How? A: Deep pipelining and focus on gate design. A lot of work. Deep pipelining is table steaks, but a lot fo other things are needed

02:30PM EDT - Q: Is power/eff an important focus? A: It does consume less power than z14 that's similar configured. From a chip perspective, the focus wasn't to reduce overall power - the focus was on performance and throughput. That was done to put two more cores in and double caches - we burned the power budget to add in more performance. This is the sort of product this is. We stuff more hardware and acceleration.

02:30PM EDT - That's a wrap for the first session. Come back in 30 minutes for the next session, where we start on AMD's 4000-series Renoir

02:31PM EDT - .

POST A COMMENT

20 Comments

View All Comments

  • Raqia - Monday, August 17, 2020 - link

    Amazing that this instruction set lives on, because: COBOL. Reply
  • Dolda2000 - Monday, August 17, 2020 - link

    I'll just say that I read up on the System/360 base instruction set a while ago, and it was actually a lot more elegant than I expected it to be. There are even aspects of it that are surprisingly RISC-like for being so early. Reply
  • abufrejoval - Thursday, August 20, 2020 - link

    No, you can run Cobol on anything, just need a compiler. The instruction set lives on because of binary compatibility to 360 and beyond. Reply
  • frbeckenbauer - Monday, August 17, 2020 - link

    Who fabs these chips? IBM is fabless these days, right? Reply
  • tipoo - Monday, August 17, 2020 - link

    Samsung Reply
  • jeremyshaw - Monday, August 17, 2020 - link

    Well, this one is probably GlobalFoundries 14nm (basically Samsung again) or 14 FD-SOI at GloFo. Reply
  • melgross - Tuesday, August 18, 2020 - link

    That’s basically saying that you don’t know, because those are pretty much the only ones who can do it, unless IBM is still running a small fab just for this. Reply
  • eastcoast_pete - Monday, August 17, 2020 - link

    Last year, someone asked me which programming language they should study, and I said: COBOL. And I wasn't joking. However, little did I know that we would need COBOL experts so desperately a few months later; in many states here in the US, unemployment benefit applications were delayed by weeks after Covid hit, because the programs processing them were all in COBOL, and it's really hard to scale out if most of your COBOL programmers are long retired or dead. These new mainframes also exist because COBOL lives on. Reply
  • RSAUser - Tuesday, August 18, 2020 - link

    Over the next few years I think a lot of COBOL systems are going to start being rewritten.
    While there were so many COBOL programmers left, it definitely was not worth rewriting such systems, and COBOL is still amazing for batch processing, but as banking will start moving more and more to "immediate" processing, COBOLs strengths will start decreasing.

    I am expecting COBOL to evolve into something like a Golang COBOL hybrid.
    Reply
  • IanWorthington - Monday, August 17, 2020 - link

    > 02:19PM EDT - Core is called millicode ?

    No. Millicode implements complex subroutine instructions.

    >5.2GHz:
    iirc, it slows down to 5GHz under certain circumstances.

    And don't forget, i/o is handled in separate "channel" processors. Linuxy stuff can be spun off to separate ZIIP processors.
    Reply

Log in

Don't have an account? Sign up now