Software Issues

So although the S824 is IBM's benchmark flagship for the scale-out range, the S812L and S822L are the servers that have the best chance at converting the kinds of users currently opt for x86 Xeons:

  • Support for Little Endian data
  • Best Linux support (Suse, Redhat & Ubuntu)
  • (Somewhat) lower power
  • 2U form factor which offers decent performance per U
  • and probably the most important reason of all: Affordable! ($10k-25k instead of $30-60k)

So yes, the S822L looks like the first worthy alternative since 2010 for the dual Xeon servers. But the S822L did not inherit all the strong points of the typical "Big Blue" servers. The clockspeeds are a bit lower to keep the power consumption in check, and more importantly the LE Linux support is still very young. Sure, POWERLinux has been around for ages, but the software ecosystem was mostly supporting a few Big Endian applications like heavy duty Java servers and SAP.

Let's make the issue at hand a bit more tangible. IBM offers a migration advisor that helps developers to port their applications. That is definitely a good thing, but it also clearly illustrates that building a software ecosystem is a lot more cumbersome than the POWERPoint slides let you believe. In case of IBM's LE Linux, porting the rich x86 Linux software ecosystem to OpenPOWER is not that straightforward:

  • Some code has inline x86 assembly such as thread resource locking code.
  • Some code has x86 specific APIs
  • No support for POWER in the make files which makes recompiling not straight forward
  • POWER is 64 bit only.

We have experienced ourselves that this was more than just theory.

Case in point: for X86-64 we simply installed well tuned, ready to run, pre compiled binaries. Benchmarking is pretty easy here with a minor scripting effort.

The story was very different on the IBM S822L. We installed Ubuntu 15.04 (3.19.0-15 - ppc64le). To satisfy our curiosity we did a quick benchmark run with Linux-Bench, an automated benchmarking tool that Ian also likes to use. The benchmark did almost nothing on our POWER system despite the fact that most of the software had some form of support for POWER based systems.

The same was true for most software out there: We had to port most of the software by delving deep in all kinds of config, Readme, and make files. In many cases, we had to search around for alternative libraries that did support OpenPOWER.

Although a lot of software had an entry for "IBM POWER" in the make files, we encountered a lot of trouble. The server nor IBM is to blame: it is simply a fact that most developers - especially those with HPC software - have put a lot more effort in optimizing and validating their Intel x86 version of their software than the more "exotic" platforms.

Linux Ecosystem Not at Full Throttle.. Yet

It is clear to us that the OpenPOWER Linux ecosytem is still young and as a result does not offer the same performance as the older PowerVM and AIX platforms. There is still quite a bit of performance headroom.

A good example is the crypto acceleration. The IBM POWER8 has a dedicated cryptographic unit supporting new POWER ISA instructions to accelerate AES (Encryption), SHA (Hashing), and CRC (Cyclic Redundancy Check) codes. A similar encryption unit was already available in the POWER7+ . We found out that an nx-crypto driver was available and part of the Linux 3.5 kernel. However, even though Ubuntu 15.04 LE for OpenPOWER is based upon the Linux kernel 3.19, the nx-crypto driver was nowhere to be found. You could argue that the same is true for Intel as they introduce new instructions, but as far as we could see, there was no encryption acceleration whatsoever possible, not even based upon the older POWER7+.

A few days after we have finished testing, we found out the vmx-crypto driver will be available in distributions using the Kernel 4.1 and later and will be enabled in OpenSSL 1.0.2 (currently 1.0.1f in the standard repositories). The slide below - found in a presentation given this month - show how fast the ecosystem is expanding but also that it is still in flux.

OpenPOWER gained traction in 2014, the POWER8 is the first POWER chip with LE support and the number of Linux servers on top of OpenPOWER systems is still very small compared to x86. It is pretty simple: it is a much smaller community than the x86 linux server community. According to "the platform", IBM claims that "scale-out POWER8 machines have seen double digit revenue growth in the first half of 2015" but those growth numbers are "against a very small base". That tells us a lot: it is indeed a very small community, but a quickly growing one.

Reading the Benchmarks Taking a Closer Look Inside IBM's S822L
Comments Locked

146 Comments

View All Comments

  • jesperfrimann - Monday, November 9, 2015 - link

    Well, I think you should kick Franz Bourlet, for not hooking you up with with a IBM technical Advocate who actually knew the technology. Such a person could have shown you the robes and helped you understand the kit better. Again Franz is a sales guy.

    IMHO selecting Ubuntu as the Linux distro, did not help you. It's new to the POWER platform and does not have the same robustness as for example SLES which have been around for 10+ years on POWER.

    The fact that you are getting better results using gcc generated code rather than xLC, shows me that something is not right.
    And that the IBM JDK isn't working is well also an indicator that something is now right.
    IMHO selecting Ubuntu, did not make Things easier for you Guys.

    And for really optimized code you need to install and use High performance math libraries for POWER (MASS), which is an addon math library.

    And AFAIR having 8 memory modules, only enables half the memory bandwidth of the system.

    So IMHO IBM didn't help you make their system look good.

    But again that is what you get when you get rid of all the clever people :)

    // Jesper
  • nils_ - Wednesday, November 11, 2015 - link

    You can always rent a box at OVH, they offer a huge chunk of an OpenPower System, albeit virtualized through Runlabs.
  • stefstef - Sunday, November 8, 2015 - link

    compared to the pentium 4 the mips r16k with loads of l3 cache was a bzip2 beast, outperforming the pentium 4 which ran at twice the clock speed and more. despite that the usage of zip programs is what these server processors are build.
  • mapesdhs - Tuesday, November 10, 2015 - link

    Just curious, do you know of any comparative results anywhere for bzip2 on old MIPS vs. other CPUs? It's not something I've seen mentioned before, at least not with respect to SGIs, but perhaps I can run som tests the next time I obtain a quad-R16K/1GHz (16MB L2) Tezro. Best I have at is only an R16K/900MHz (8MB L2) single-CPU Fuel and various configs of Tezro and Onyx350 from 4 to 16x 700MHz with 8MB L2. Just a pity SGI never got to employ multi-core MIPS (it was planned, but alas never happened).

    Oddly, back when current, MIPS' real strength was fp. Over time it fell behind badly for general int, though for SGI's core markets that didn't really matter ("It's the bandwidth, stupid!" - famous quote from Mashey IIRC). MIPS could have caught up with MDMX and MIPS V ISA, especially with the initially intended merged Cray vector stuff, but again that all fell away once the design talent moved to Intel in 1996/7.

    Ian.
  • Freen the merciless - Sunday, November 8, 2015 - link

    Heh! Sparc T5 eats Xeon and power for breakfast.
  • kgardas - Monday, November 9, 2015 - link

    I guess you mean T7 with SPARC M7 inside and not T5. If so, then yes, M7 looks quite capable, but unfortunately provides horrible price/performance ratio. POWER8 box starts at ~6.5k $ while T7-1 on ~40k $. So on SPARC front we'll need to see if Oracle is going to change that with Sonoma chip.
  • Michael Bay - Monday, November 9, 2015 - link

    In parallel only.
  • aryonoco - Tuesday, November 10, 2015 - link

    Thank you Johan for this amazingly well written and well researched article.

    I have to agree with a few people here that question your choice of using LE Ubuntu to test. Traditionally people who use Linux on POWER use SUSE, and some use RHEL, but Ubuntu? Nothing against them, and I love apt, but it's just not a mature platform.

    Try with something more representative such as BE SLES and you will find a vastly different types ecosystem maturity.

    But thanks again, and also thanks to AT for caring about such subjects and publishing these tests.
  • JohanAnandtech - Wednesday, November 11, 2015 - link

    Thank you for taking the time to write up some constructive feedback. I have years of experience with ubuntu and linux and I wanted to play it safe. Running benchmarks on "new" hardware with a new ISA (from my perspective) is pretty complex. C-ray and 7-zip are the only exceptions, but most real server apps (NAMD, ElasticSearch, Spark) depends on many layers of software.

    In theory the OS/ distro is more important to get applications working than the ISA. In practice, it might have been better to bet on the distro with the most maturity and adapt our scripts and installation procedures to Suse.

    But as soon as I get the chance, I'll try out BE suse or redhat on a POWER system.
  • mapesdhs - Tuesday, November 10, 2015 - link

    Johan,

    A minor point, please note my home page for C-ray is here:

    http://www.sgidepot.co.uk/c-ray.html

    Blinkenlights is just a mirror, and not the primary mirror either (that would be the vintagecomputers site).

    Btw, it's a pity you didn't use the same image sizes & settings as used on the main c-ray site, because then I could have included the results on my page (ie. 'sphfract' at 800x600, 1024x768 with 8X oversampling, and 7500x3500), or did you just use the same settings that Phoronix employs?

    Also, John Tsiombikas, the guy who wrote C-ray, told me some interesting things about the test and how it works (info included on the page), most especially that it is highly vulnerable to compiler optimisations which can produce results that are even less realistic than real life workloads. I'm glad thought that you did at least use the sphfract test, since at a sensible resolution or with oversampling it easily pushes the test out of just L1 (the 'scene' test is much smaller). But yeah, overall, c-ray was never intended to be used as a benchmark, it's just taken off somehow, perhaps because the scanline method of threading makes it scale very well.

    Hmm, I really must sort out the page formatting one of these days, and move the most complex test tables to the top. Never seem to find the time...

    Thanks!!

    Ian.

    PS. I always obtained the best results by having more threads than the no. of cores/CPUs, or is this something which doesn't work with non-MIPS systems?

Log in

Don't have an account? Sign up now