Some of you may remember AMD announcing the "Torrenza" technology 10 years ago. The idea was to offer a fast and coherent interface between the CPU and various types of "accelerators" (via Hyper Transport). It was one of the first initiatives to enable "heterogeneous computing".

We now have technology that could be labeled "heterogeneous computing", the most popular form being GPU computing. There have been also encryption, compression and network accelerators, but the advantages of those accelerators were never really clear, as shifting data back and forth to the CPU was in many cases less efficient than letting the CPU process it with optimized instructions. Heterogeneous computing was in the professional world mostly limited to HPC; in the consumer world a "nice to have".

But times are changing. The sensors of the Internet of Things, the semantic web and the good old www are creating a massive and exponentially growing flood of data that can not be stored and analyzed by traditional means. Machine learning offers a way of classifying all that data and finding patterns "automatically". As a result, we witnessed a "machine learning renaissance", with quite a few breakthroughs. Google had to deal with this years ago before most other companies, and released some of those AI breakthroughs of the Google Brain Team in the Open Source world, one example being "TensorFlow". And when Google releases important technology into the Open Source world, we know we got to pay attention. When Google released the Google File System and Big Table back in 2004 for example, a little bit later the big data revolution with Hadoop, HDFS and NoSQL databases erupted.

Big Data thus needs big brains: we need more processing power than ever. As Moore's law is dead (the end of CMOS scaling), we can not expect much from process technology advancements. The processing power has to come from ASICs (see Google's TPU), FPGAs (see Microsoft's project Catapult) and GPUs.

Those accelerators need a new "Torrenza technology", a fast, coherent interconnect to the CPU. NVIDIA was first with NVLink, but an open standard would be even better. IBM on the other hand was willing to share the CAPI interface.

To that end, Google, AMD, Xilinx, Micron and Mellanox have joined forces with IBM to create a "coherent high performance bus interface" based on a new bus standard called "Open Coherent Accelerator Processor Interface" (OpenCAPI). Capable of a 25Gbits per second per lane data rate, OpenCAPI outperforms the current PCIe specification, which offers a maximum data transfer rate of 8Gbits per second for a PCIe 3.0 lane. We assume that the total bandwidth will be a lot higher for quite a few OpenCAPI devices, as OpenCAPI lanes will be bundled together.

It is a win, win for everybody besides Intel. It is clear now that IBM's OpenPOWER initiative is gaining a lot of traction and that IBM is deadly serious about offering an alternative to the Intel dominated datacenter. IBM will implement the OpenCAPI interface in the POWER9 servers in 2017. Those POWER9s will not only have a very fast interface to NVIDIA GPUs (via NVLink), but also to Google's ASICs and Xilinx FPGAs accelerators.

Meanwhile this benefits AMD as they get access to an NVLink alternative to link up the Radeon GPU power to the upcoming Zen based server processors. Micron can link faster (and more profitable than DRAM) memory to the CPU. Mellanox can do the same for networking. OpenCAPI is even more important for the Xilinx FPGAs as a coherent interface can make FPGAs attractive for a much wider range of applications than today.

And guess what, Dell/EMC has joined this new alliance just a few days ago. Intel has to come up with an answer...

Update: courtesy of commenter Yojimbo: "NVIDIA is a member of the OpenCAPI consortium, at the "contributor level", which is the same level Xilinx has. The same is true for HPE (HP Enterprise)".

This is even bigger than we thought. Probably the biggest announcement in the server market this year.

 

Source: OpenCAPI

Comments Locked

49 Comments

View All Comments

  • eachus - Friday, October 14, 2016 - link

    There are several differences between these proposals and PCI Express. The first, and most important, is cache coherency. Keeping caches coherent does cost bandwidth, but it makes for much simpler code on each end, and net-net much faster communications.

    Also protocols are optimized around the size of information blocks transferred. Cache-coherency traffic will require some small quick transfers to verify consistency, but these protocols envision large block transfers once the handshaking is complete.

    The other major difference is addressing. For all of these new protocols, you need huge virtual address spaces, where part of the (physical) address tells which device manages the corresponding virtual addresses. Every device needs to be able to do virtual to physical mapping, potentially for the entire virtual address space. These maps, of course will be multilevel, and occasionally packets will have to be sent back and forth between the device decoding the virtual address, and the device which manages that part of the map. Not all devices need to manage virtual memory tables, but all will have TLBs (translation lookaside buffers) to do virtual to physical mapping quickly for most cases. (AKA locality of reference. Most code and data accesses a few small areas of addresses most of the time.)
  • zangheiv - Friday, October 14, 2016 - link

    This is of particular importance to AMD because it's ideal to have large amounts of Storage and Memory located on the GPU board closer to the respective compute source, that is coherently connected to the CPU which is coherently connected to other servers in the cluster. Dual-Zen servers will have upto 64 cores connected to several Radeon GPUs, So whether the architecture is for HPC, HW virtualization or micro-segmentaion, this open architecture makes very good sense.
  • johnpombrio - Friday, October 14, 2016 - link

    Does this remind anyone else of Micro Channel by IBM?
  • JohanAnandtech - Saturday, October 15, 2016 - link

    Micro channel was IBM's attempt to make part of the pc technology proprietary again. Exactly the opposite of OpenCAPI.
  • gobears99 - Saturday, October 15, 2016 - link

    I've noticed that Anandtech has yet to cover Intel's Xeon+FPGA systems. They play in the same space as CAPI (are there shipping CAPI solutions other than FPGAs???). Intel rolled out the first generation of the academic program for Xeon+FPGA over a year ago. They also presented details at a workshop held during ISCA2016. Slides available https://cpufpga.files.wordpress.com/2016/04/harp_i...
  • zodiacfml - Saturday, October 15, 2016 - link

    I feel this is just to ensure that there will be an interconnect available if the use of various types of accelerators simultaneously will prove popular.
    Intel seems to be in a wait and see strategy these days except for their non-volatile memory Xpoint. It is quite understandable considering they can just throw money at a problem.
  • zodiacfml - Saturday, October 15, 2016 - link

    Speaking of throwing money at a problem, they acquired Altera recently.
  • dangky3g - Wednesday, November 2, 2016 - link

    <a href="http://dangky3gvina.com/cach-dang-ky-3g-vinaphone.... ky goi cuoc 3g vina</a>
  • dangky3g - Wednesday, November 2, 2016 - link

    Huong dan dang ky 3g vinaphone >> http://dangky3gvina.com/cach-dang-ky-3g-vinaphone....

Log in

Don't have an account? Sign up now