Our Broadwell coverage on the desktop has included reviews of the two consumer processors and a breakdown of IPC gains from generation to generation. One issue surrounding Broadwell on consumer platforms was that the top quad-core model was rated at one third less power in comparison to previous Intel quad core processors. Specifically, Broadwell is 65W against 84-95W in past generations. This puts Broadwell’s out-of-the-box peak performance at a TDP (and frequency) disadvantage. However in a somewhat under-the-radar launch, Intel also released a series of Broadwell Xeons under the E3-12xx v4 line. We sourced three socketed models, the E3-1285 v4 at 95W, the E3-1285L v4 at 65W and the E3-1265L v4 at 35W to get a better scope of Broadwell's scaling across different power requirements.

Broadwell Xeon Overview

In almost every sense of the word, the launch of Broadwell in a socketed format has been fairly muted. For low-power mobile platforms at 4W and 15W, Broadwell was promoted heavily and the architecture has had many design wins; but for the desktop only two socketed consumer parts were launched. To that end, Intel performed only post-launch sampling for review websites, resulting in many users learning about the performance much later after the official launch (AnandTech had you covered on day one!). Even now, several weeks later, the i7-5775C and i5-5675C are both hard to source in several regions. Complicating matters is that Intel’s platform after Broadwell, Skylake, was launched soon after in early August with a bigger focus on gaming and end-user experiences, as well as the announcement of the Skylake Xeon family integrating into dedicated mobile processors. This has the effect of resigning Broadwell on the desktop to obscurity, whether intentionally or not (cue the conspiracy theorists).

The crumb of comfort in Broadwell is its use of 128MB of eDRAM. This acts as a fully associative last level victim cache (or L4) for the processor, and speeds up certain workloads that are memory dependent and subject to L3 cache misses when data has been previously evicted. This is a narrow double-pumped serial interface capable of delivering 50GB/s bi-directional bandwidth (100GB/s aggregate). Access latency after a miss in the L3 cache is 30 - 32ns, nicely in between an L3 and main memory access. The major benefit in our testing was to the integrated graphics, giving Intel the best integrated graphics in a bit socketed platform where money is no cost. Some reviews also saw that the eDRAM helped in discrete graphics gaming as well, although at a small effect and highly game dependent (but this raises other issues regarding higher performance on lower power/frequency processors with more on-package memory). The main downside of the eDRAM however is that it is CPU resident, not visible from the system agent to the DRAM, and thus only accessible to CPU/GPU workloads rather than accelerating data over the system IO.

What went further under the radar was the Intel Broadwell platform Xeons for the business and server market. We reported on its launch, but there was seemingly nothing front facing about the marketing of these processors, suggesting Intel might be keeping them as a pure business-to-business product. All bar one of these processors also support the eDRAM but also several Xeon-specific features. Back with Haswell, Intel launched a single soldered (BGA) Xeon with eDRAM. By having three socketed variants (LGA) for Broadwell at ‘launch’, it satisfies business customers that want to upgrade from Haswell E3 v3 Xeons but also provides business environments and server environments with the use of that eDRAM. One of the cited uses for it includes active memory databases, giving fewer cache misses by having a larger chunk of faster memory closer to the processing cores.

All the Broadwell Xeons are quad core with hyperthreading, with all bar one having Iris Pro P6300 (the professional version of Iris Pro 6200) on 48 EUs/GT3e. All but one soldered part has the eDRAM disabled. (Note that the E3-1284L v3 is listed by CPU-World but not currently listed at ark.intel.com.) Aside from this, the models differ solely on the basis of processor frequency, graphics frequency and thermal design limits. It is interesting to note the differences between the E3-1285 v4 and the E3-1285L v4.  Sitting at 95W and 65W respectively, that 30W difference in TDP is represented by only a 100 MHz difference in the base frequency.  This is relatively odd, and suggests that the 65W part, the E3-1285L v4, is a better off-the-wafer part with preferred frequency/voltage characteristics which also costs almost $100 or ~20% less. This plays a significant part in our testing.

The eDRAM stands out compared to previous Xeons, although at the expense of 2MB of L3 cache compared to previous high end quad core models (or i7 equivalents). Some microprocessor analysts have said that the loss of 2MB of L3 is not that important when backed up by 128MB of a fast L4 type of cache, on the basis that the bandwidth of this L4 is 50GB/s and up before you hit main memory.

Why eDRAM?

In a recent external podcast, David Kanter mentioned that for a multiple increase in cache (e.g. 3x), cache misses are decreased on average by the square root of the multiple increase (e.g. √3, or 1.73, 73%). So the movement from 8MB of last level cache to 128MB + 6MB, and despite the minor increase in latency moving out to eDRAM, is an effective 16x increase, reducing cache misses by a factor of four. This means, to quote, ‘if you have eight cache misses per thousand, you are now down to around two’ – I take this to mean a regular user workload but in a higher throughput environment, it could mean the difference between 2% and 0.5% cache misses out to main memory. Because the move out to main memory is such a latency and bandwidth penalty compared to an on-package transfer between the CPU and L3/L4, even a small decrease in cache misses has performance potential when used in the right context. Anand quoted Intel’s Tom Piazza back in our Haswell eDRAM review about the size of the eDRAM, and it was stated that 32MB should be enough, but it was doubled and then doubled it again just to make sure, as well as ‘go big or go home’. This has knock on performance effects.

Users upgrading to Broadwell Xeons from Haswell (or those purchasing new systems outright) will get this eDRAM benefit and a lower cost than previous Xeons – the E3-1285L v3 from the Haswell architecture was launched at a price of $774, compared with the E3-1285L v4 which is $445. For the difference, the Broadwell processor comes with eDRAM, substantially better integrated graphics and all within the same thermal design. At 95W, this difference is from $662 to $556, a much smaller difference. This suggests that on Haswell, the lower power model was harder to produce, whereas with Broadwell that burden shifts on the frequency.

Graphics Virtualization and Upgrades

One of the benefits of the Broadwell Xeons with eDRAM lies in Intel's graphics virtualization technology (GVT). This affords three modes of operation:

The benefits of these virtualization techniques allow data centers to essentially apply an accelerant to each VM depending on the beneftis of the GPU on each workload. With it being directly included in the CPU, no additional hardware is needed. Obviously this makes more sense when each virtual machine requires infrequent access to the integrated graphics, but for everything else, Intel is set to launch it's Valley Vista platform which will adorn three of these CPUs onto an add-in PCIe card.

Valley Vista

At IDF San Francisco this year, an announcement passed almost everyone by. Intel described an add-in card coming in Q4 2015 that features three Broadwell-H E3 Xeon processors on a single PCB, each with Iris Pro graphics.

Valley Vista is designed to allow for high density, workload specific work, in particular AVC transcoding. Aside from the slide above, there has been no real details as to how this card will work - if there's a PCIe switch for communication, or if it runs in a virtualized layer, or how the card is powered or if each of the processors on the card will have a fixed amount of DRAM associated with them. So far Supermicro announced in a press release that one of their Xeon Phi platforms is suitable for the cards when they get launched later this year. What we do know however is that Broadwell is not fully HEVC accelerated, so the utility in Valley Vista is most likely to be with AVC encode/decode. 

Chipsets

As with previous socket drop-ins on the professional line, Intel is promoting the use of its C226 chipset - for our testing, we used an equivalent Z97 platform which worked as well.

This provides an as-is scenario, with sixteen lanes of PCIe 3.0, two channels of DDR3/L-1600 memory with ECC support, a DMI 2.0 x4 link equivalent to 4 GB/s, up to six native USB 3.0 and SATA 6 Gbps ports, depending on the high-speed IO configuration used in the chipset in conjunction with the eight PCIe lanes.

This Review

There isn’t much else to say here – we have covered Broadwell on the desktop and the differences are spelled out for end users despite the current lack of direct availability in certain markets at this time. These are Xeon processors, so no overclocking here, but the main parallel we should be making is the 95W of the E3-1285 v4 and the E3-1276 v3 at 84W. The E3 has some extra frequency (peaks at 4 GHz) and extra L3 cache, but the Xeon has eDRAM. 

Compared to Johan’s in depth server reviews, the focus for the testing on this piece is primarily at workstation environments. Because we did not get a 95W ‘consumer’ based Broadwell for comparison, gaming tests were also performed. Unfortunately the Linux based server tests we typically use were not performed due to a spectacular failing of our Ubuntu LiveCD with these processors, even though it worked with the non-Xeon counterparts. We’re still trying to figure this one out but we suspect it is a driver related issue. While in no way similar, in its stead we have SPECviewperf 12 on Windows with a discrete GPU (it's typical use case) as an additional angle of comparison.

A side note to those have recently asked - we are in the process of looking into appropriate repeatable compilation benchmarks and VM environment comparisons. Ideally we are aiming to finalize a series of tests that can be one-click batched and processed within a reasonable testing timeframe. These will not be ready until mid-September at the earliest due to other commitments, but when available we will try and run a number of past systems to acquire appropriate comparative data. To add comments, suggestions or preferences on the tests, please email ian@anandtech.com.

Test Setup

Test Setup
Processor
Intel Xeon E3-1285 v4 95W 4C/8T 3.5 GHz / 3.8 GHz Broadwell
Intel Xeon E3-1285L v4 65W 4C/8T 3.4 GHz / 3.8 GHz Broadwell
Intel Xeon E3-1265L v4 35W 4C/8T 2.3 GHz / 3.3 GHz Broadwell
Motherboards MSI Z97A Gaming 6
Cooling Cooler Master Nepton 140XL
Power Supply OCZ 1250W Gold ZX Series
Memory G.Skill RipjawsZ 4x4 GB DDR3-1866 9-11-11 Kit
Video Cards ASUS GTX 980 Strix 4GB
MSI GTX 770 Lightning 2GB (1150/1202 Boost)
ASUS R7 240 2GB
Hard Drive Crucial MX200 1TB
Optical Drive LG GH22NS50
Case Open Test Bed
Operating System Windows 7 64-bit SP1

The dynamics of CPU Turbo modes, both Intel and AMD, can cause concern during environments with a variable threaded workload. There is also an added issue of the motherboard remaining consistent, depending on how the motherboard manufacturer wants to add in their own boosting technologies over the ones that Intel would prefer they used. In order to remain consistent, we implement an OS-level unique high performance mode on all the CPUs we test which should override any motherboard manufacturer performance mode.

All of our benchmark results can also be found in our benchmark engine, Bench.

Many thanks to...

We must thank the following companies for kindly providing hardware for our test bed:

Thank you to AMD for providing us with the R9 290X 4GB GPUs.
Thank you to ASUS for providing us with GTX 980 Strix GPUs and the R7 240 DDR3 GPU.
Thank you to ASRock and ASUS for providing us with some IO testing kit.
Thank you to Cooler Master for providing us with Nepton 140XL CLCs.
Thank you to Corsair for providing us with an AX1200i PSU.
Thank you to Crucial for providing us with MX200 SSDs.
Thank you to G.Skill and Corsair for providing us with memory.
Thank you to MSI for providing us with the GTX 770 Lightning GPUs.
Thank you to OCZ for providing us with PSUs.
Thank you to Rosewill for providing us with PSUs and RK-9100 keyboards.

Load Delta Power Consumption

Power consumption was tested on the system while in a single GTX 770 configuration with a wall meter connected to the OCZ 1250W power supply. This power supply is Gold rated, and as I am in the UK on a 230-240 V supply, leads to ~75% efficiency > 50W, and 90%+ efficiency at 250W, suitable for both idle and multi-GPU loading. This method of power reading allows us to compare the power management of the UEFI and the board to supply components with power under load, the power delta from idle to CPU loading, and all results include typical PSU losses due to efficiency.

Power Delta (Long Idle to OCCT)

Power numbers are typically difficult to gauge as they depend on the stock voltage of the processor and how aggressive the motherboard wants to be in order to ensure stability. If I were thinking from the point of view of the motherboard manufacturer, they are more likely to overvolt a Xeon processor to ensure that stability rather than deal with any unstable platforms. As a result, we get an odd scenario where the 35W processor is almost hitting double the power consumption at load, and the 65W is also above its mark, but the 95W is below. To put an angle on this, the 110W we see on the i7-6700K was in one motherboard, but in another we have seen 76W as well as 84W. Without having access to the BIOS DVFS tables for each processor, it is difficult to tell when we have mismatched data such as this.

Professional Performance: Windows
Comments Locked

72 Comments

View All Comments

  • Ian Cutress - Thursday, August 27, 2015 - link

    So to clear up your misconceptions: we (or more specifically, I) have not retested any AM3 product yet on our 2015 benchmark suite due to time restrictions and general lack of reader interest in AM3. I have 3 test beds, and our CPU/GPU tests are only partially automated, requiring 35+ working hours of active monitoring for results. (Yes, can leave some tests on overnight, but not that many). Reserving one test bed for a month a year for AM3+ limits the ability to do other things, such as motherboard tests/DRAM reviews/DX12 testing and so on.

    You'll notice our FX-9590 review occurred many, many months after it was officially 'released', due to consumer availability. And that was just over 12 months ago - I have not been in a position to retest AM3 since then. However, had AMD launched a new CPU for it, then I would have specifically made time to circle back around - for example I currently have the A8-7670K in to test, so chances are I'll rerun the FM2+ socket as much as possible in September.

    That being said, we discussed with AMD about DirectX 12 testing recently. Specifically when more (full/non-beta) titles are launched to the public, and we update our game tests (on CPU reviews) for 2016. You will most likely see the FX range of CPUs being updated in our database at that time. Between now and then, we have some overlap between the FX processors and these E3 processors in our benchmarking database. This is free for anyone to access at any time as and when we test these products. Note that there is a large price difference, a large TDP difference, but there are some minor result comparisons for you. Here's a link for the lazy:

    http://anandtech.com/bench/product/1289?vs=1538

    The FX-9590 beats the 35W v4 Xeon in CineBench, POV-Ray and Hybrid, despite being 1/3 the price but 6x the power consumption.
  • Oxford Guy - Thursday, August 27, 2015 - link

    The 9590 is a specialty product, hardly what I was focusing on which is FX overclocked to a reasonable level of power consumption. The 9590 does not fall into that category.

    You can get an 8320E for around $100 at Microcenter and pair it with a discount 970 motherboard like I did ($25 with the bundle pricing a few months ago for the UD3P 2.0) and get a decent clockspeed out of it for now much money. I got my Zalman cooler for $20 via slickdeals and then got two 140mm fans for it. The system runs comfortably at 4.5 GHz (4.4 - 4.5 are considered the standard for FX -- for the point where performance per watt is still reasonable). Those pairing it with an EVO cooler might want 4.3 GHz or so.

    The 9590 requires an expensive motherboard, expensive (or loud) case cooling, and an expensive heatsink. Running an FX at a clockspeed that is below the threshold at which the chip begins to become a power hog is generally much more advisable. And, review sites that aren't careful will run into throttling from VRMs or heat around the chip which will give a false picture of the performance. People in one forum said adamantly that the 9590 chips tend to be leaky so their power consumption is even higher than a low-leakage chip like 8370E.

    One of your reviews (Broadwell I think) had like 8 APUs in it and not a single FX. That gives people the impression that APUs are the strongest competition AMD has. Since that's not true it gives people the impression that this site is trying to manipulate readers into thinking Intel is more superior than it actually is in terms of price-performance.

    There is no doubt that FX is old and was not ideal for typical desktop workloads when it came out. Even today it only has about 1.2 billion transistors and still has 32nm power consumption. But, since games are finally beginning to use more than two cores or so, and because programs like Blender (which you probably should use in your results) can leverage those cores without exaggerating the importance of FPU (as Cinebench is said to do) it seems to still be clinging to relevance. As for lack of reader interest in FX, it's hard to gauge that when your articles don't include results from even one FX chip.

    Regardless of reader interest if you're going to include AMD at all, which you should, you should use their best-performing chip (although not the power-nuts 9590) design — not APUs — unless you're specifically targeting small form factors or integrated graphics comparisons.
  • Oxford Guy - Thursday, August 27, 2015 - link

    You also ran an article about the 8320E. Why not use that 8320E, overclocked to reasonably level like 4.5 GHz, as the basis for benchmarks you can include in reviews?
  • SuperVeloce - Thursday, August 27, 2015 - link

    Clocks are not identical (you know the meaning of that word, right?). And the 4790k was released a year after first haswells. Usually you compare models from the launch day of the said arhitecture.
  • MrSpadge - Thursday, August 27, 2015 - link

    It doesn't matter what launched on launch day of the older competition. It matters what one can buy at the current launch date instead of the new product.
  • mapesdhs - Thursday, August 27, 2015 - link

    Hear hear! Reminds me of the way reference GPUs keep being used in gfx articles, even when anyone with half a clue would buy an oc'd card either because they're cheaper, or seller sites don't sell reference cards anymore anyway.
  • Oxford Guy - Wednesday, August 26, 2015 - link

    "cue the realists"

    Corporations are a conspiracy to make profit for shareholders, CEOs, etc. The assumption of conspiracy should be a given, not a "theory". Any business that isn't constantly conspiring to deliver the least product for the most return is going to either die or stagnate.
  • boxof - Wednesday, August 26, 2015 - link

    "In a recent external podcast, David Kanter"

    Couldn't bring yourselves to mention your competition huh? Stay classy.
  • Dr.Neale - Wednesday, August 26, 2015 - link

    Your comparison of the Xeon e3-1276 v3 to the e3-1285 v4, e3-1285L v4, and e3-1265L v4 is systematically slightly biased in favor of the e3-1276 v3, because for all tests you use (non-ECC) DDR3 1866 memory, whereas with ECC memory (and a C226 chipset that supports it, as in an ASUS P9D WS motherboard), the v3 Xeon is limited to DDR3 1600, while the v4 Xeons can use DDR3 1866 memory.

    Therefore using DDR3 1866 memory with the v3 Xeon gives it a slight systematic performance boost over what it would achieve with only DDR3 1600 memory, which is the maximum speed it can use in an ECC / C226 workstation.

    With this in mind, I believe the performance of a e3-1276 v3 Xeon with DDR3 1600 memory would more closely match that of the e3-1285 v4 and e3-1285L Xeons with DDR3 1866 memory, than is indicated in the graphs here, where the v3 and v4 Xeons are all tested with the same DDR3 1866 memory only.
  • ruthan - Thursday, August 27, 2015 - link

    This power consumption mystery have to be discovered, its like Geforce 970 4 GB thing. Maybe Intel cheating with those numbers, because there are customer like me, which prefer lower power and silence are ready to pay for that.

    Most typical workstation use case, where im still missing tons of horsepower on CPU side is virtualization, especialy for gaming, yesterday released Vmware workstation 12 with DX10 support. Especially in Linux enviroment, gaming in virtual machine make a sense (i know, i know there is not DX10 suportt even through wrapper).

Log in

Don't have an account? Sign up now