Every once in a while, a startup comes along with something out of left field. In the AI hardware generation, Cerebras holds that title, with their Wafer Scale Engine. The second generation product, built on TSMC 7nm, is a full wafer packed to the brim with cores, memory, and performance. By using patented manufacturing and packaging techniques, a Cerebras CS-2 features a single chip, bigger than your head, with 2.6 trillion transistors. The cost for a CS-2, with appropriate cooling, power, and connectivity, is ‘a few million’ we are told, and Cerebras has customers that include research, oil and gas, pharmaceuticals, and defense – all after the unique proposition that a wafer scale AI engine provides. Today’s news is that Cerebras is still in full startup mode, finishing a Series F funding round.

The new Series F funding round nets the company another $250m in capital, bringing the total raised through venture capital up to $720 million. In speaking to Cerebras ahead of this announcement, we were told that this $250 million was for effectively 6% of the company, bringing the valuation of Cerebras to $4 billion. Compared to Cerebras’ last Series E funding round in 2019, where the company was valued at $2.4 billion, we’re looking at about $800m extra value year on year. This round of funding was led by Alpha Wave Ventures, a partnership between Falcon Edge and Chimera, who are joining Cerebras’ other investors such as Altimeter, Benchmark, Coatue, Eclipse, Moore, and VY.

Cerebras explained to me that it’s best to get a funding round out of the way before you actually need it: we were told that they already had the next 2-3 years funded and planned, and this additional funding round provides some more on top of that, allowing the company to also grow as required. This encompasses not only the next generations of wafer scale (apparently a 5nm tape-out is around $20m), but also the new memory scale-out systems Cerebras announced earlier this year. Currently Cerebras has around 400 employees across four sites (Sunnyvale, Toronto, Tokyo, San Diego), and is looking to expand to 600 by the end of 2022, focusing a lot on engineers and full stack development.

Cerebras Wafer Scale
AnandTech Wafer Scale
Engine Gen1
Wafer Scale
Engine Gen2
Increase
AI Cores 400,000 850,000 2.13x
Manufacturing TSMC 16nm TSMC 7nm -
Launch Date August 2019 Q3 2021 -
Die Size 46225 mm2 46225 mm2 -
Transistors 1200 billion 2600 billion 2.17x
(Density) 25.96 mTr/mm2 56.246 mTr/mm2 2.17x
On-board SRAM 18 GB 40 GB 2.22x
Memory Bandwidth 9 PB/s 20 PB/s 2.22x
Fabric Bandwidth 100 Pb/s 220 Pb/s 2.22x
Cost $2 million+ arm+leg

To date Cerebras’ customers have been, in the company’s own words, from markets that have traditionally understood HPC and are looking into the boundary between HPC and AI. This means traditional supercomputer sites, such as Argonne, Lawrence Livermore, and PSC, but also commercial enterprises that have traditionally relied on heavy compute such as pharmaceuticals (AstraZeneca, GSK), medical, and oil and gas. Part of Cerebras roadmap is to expand beyond those ‘traditional’ HPC customers and introduce the technology in other areas, such as the cloud – Cirrascale recently announced a cloud offering based on the latest CS-2.

Coming up soon is the annual Supercomputing conference, where more customers and deployments are likely to be announced.

Related Reading

Comments Locked

24 Comments

View All Comments

  • teshy.com - Thursday, November 11, 2021 - link

    I wonder how this compares to Teslas D1 chip for the Dojo supercomputer?
  • mode_13h - Friday, November 12, 2021 - link

    Well, I'm reading on wikipedia that each Tesla D1 has 362 TFLOPS (presumably of BFloat16). They combine 25 of those into a "tile" (9.05 PFLOPS), and the machine includes 120 tiles = 1.09 EFLOPS.

    I just checked the linked article on the WSE2, and they pointedly don't mention any performance numbers. However, if you figure 850k cores per wafer each delivering something like 32 FLO per cycle @ 1 GHz, that would work out to 27.2 PFLOPS per wafer. So, about 3x Tesla's "tile".

    Of course, I could be off by some multiple. For instance, if WSE2 cores are each 1024 bits wide and use BFloat16, then that would be 4x the above figure (i.e. 108.8 PFLOPS). In any case, it's probably safe to say WSE2 is in the range of tens to hundreds of PFLOPS.
  • Farfolomew - Wednesday, November 24, 2021 - link

    Any word on them going public with an IPO?

Log in

Don't have an account? Sign up now