Snapdragon 800 (MSM8974) Performance Preview: Qualcomm Mobile Development Tablet Tested
by Brian Klug on June 18, 2013 8:00 PM ESTWe’ve written about Snapdragon 800 (MSM8974) before, for those unfamiliar, this is Qualcomm’s new flagship SoC with four Krait 400 CPUs at up to 2.3 GHz, Adreno 330 graphics, and the latest modem IP block with Category 4 LTE. Qualcomm is finally ready to show off MSM8974 performance on final silicon and board support software, and invited us and a few other publications out to San Francisco for a day of benchmarking and poking around. We looked at MSM8974 on both the familiar MSM8974 MDP/T, a development tablet used both by Qualcomm and 3rd parties to develop drivers and platform support, and the MSM8974 MDP phone, both of which have been publicly announced for some time now.
The tablet MDP is what you’d expect, an engineering platform designed for Qualcomm and other third parties to use while developing software support for features. Subjectively it’s thinner and more svelte than the APQ8064 MDP/T we saw last year, but as always OEMs will have the final control over industrial design and what features they choose to expose. Display is 1080p on the tablet and 720p on the phone, a bit low considering the resolutions handset and tablet markers are going for (at least 1080p on phone and WQXGA on tablets) so keep that in mind when looking at on-screen results from benchmarks.
Qualcomm Snapdragon 800 Mobile Development Platform Tablet | |
MSM8974 MDP/T | |
SoC | MSM8974 Snapdragon 800 |
CPU | 4x Krait 400 at 2.3 GHz |
GPU | Adreno 330 at 450 MHz |
RAM | 2GB 2x32 LPDDR3 800 MHz |
NAND | 32 GB eMMC 4.5 |
Cameras | 12 MP with flash (rear), 2 MP (front) |
Display | 11.6-inch 1080p |
I/O | USB 3.0, microHDMI, microSD, 3.5mm headset |
OS | Android 4.2 |
Snapdragon 800, nee MSM8974 is built on TSMC’s 28nm HPM (High Performance for Mobile) HK-MG, as opposed to 28nm LP polysilicon (low power). The result are higher clocks for CPU, from 1.5–1.7 GHz on Krait 200–300 which was 28nm LP, to 2.2–2.3 GHz on Krait 400 on 28nm HPM. The jump between Krait 200 and Krait 300 brought higher clocks and also a jump in IPC, this time around Krait 400 is essentially a Krait 300 implemented on 28nm HPM, which means some relayout. There’s also a faster L2 cache on Krait 400.
These are final clocks on MSM8974 – Krait 400 runs its four cores at up to 2.3 GHz, though some lots will come at 2.2 GHz. GPU on MSM8974 is Adreno 330 which runs at 450 MHz and brings some architectural improvements over Adreno 320.
On the video side, MSM8974 is capable of encoding UHD 4K (3840 x 2160) 30 FPS video at up to 120 Mbps H.264 High Profile, and is capable of playing back the same file. Qualcomm had a demo going showing this mirrored on the latest Sony 4K UHD TV as well over microHDMI. I recorded a video sample and took a copy for your perusal and onto YouTube. True to their word the video I grabbed is 120 Mbps and 3840 x 2160, framerate was just over 25 FPS but I'm not sure if the demo was setup for 30 FPS capture. MSM8974 has the hardware encoder for H.264 but not HEVC H.265, that's implemented in software.
Snapdragon 800 should begin popping up in phones and tablets fall 2013. Anyhow let's take a look at MSM8974 performance.
115 Comments
View All Comments
shodanshok - Thursday, June 20, 2013 - link
I forgot to specify the benchmark used. It is Coremark: http://www.coremark.org/It is a industry standard benchmark with freely available sources.
Wilco1 - Friday, June 21, 2013 - link
Really? Looking at the published results it shows Exynos 4 does 5560 Coremarks/core at 1.4GHz.The fastest per-core Atom result is 2.3 CM/MHz for 1 thread, and 3.3 with Hyperthreading.
Cortex-A9 does 4.0 for 1 thread - so it is 74% faster single threaded, and 21% faster core for core.
So the A9 destroys Atom on CoreMark as well. I am surprised several of you are trying to argue that in-order cores beat out-of-order cores despite the facts.
shodanshok - Friday, June 21, 2013 - link
No, it is incredible how you pretend to extrapolate _precise_ performance numbers from vague arch details.Return to Coremark site, because you misunderstan the benchmark results. The CM/MHz score represent the score of the entire soc - so it don't rule out core count differences. Let see the CM/core score instead and you will find that Atom is in the same field of A9 scores, sometime much better.
Some examples: Atom z520 vs Tegra2 and Atom n2800 vs exynos4 quad.
Please also note that:
- Coremark does not stress l2/memory in any way. This is the only reason why A9 slow memory interface does not interfere here;
- the compiler has enormous importance in it's score.
The real Atom problem was the terrible GPU and companion chipset.
Regards.
Wilco1 - Friday, June 21, 2013 - link
I listed the per core results, as I said A9 is 74% faster single threaded and 21% faster with Hyperthreading enabled. These are results from the EEMBC website, no complex extrapolation involved.Coremark runs mostly in L1, however it does stress the branch predictor seriously. All benchmarks have a major compiler component. Coremark is horrible like pretty much any EEMBC stuff so I don't think it will become popular.
shodanshok - Saturday, June 22, 2013 - link
I can not agree. From CoreBench site:### Comparison 1:
Tegra2 @ 1.00 GHz (2 A9 cores):
Coremark: 5866.39
Coremark/Core: 2933.20
Atom Z520 @ 1.33 GHz (1 Atom Core):
Coremark: 3192.17
Coremark/Core: 3192.17
Atom advantage: 9%
### Comparison 2:
Exynos4 Quad @ 1.4 GHz (4x A9 cores)
Coremark: 22243.00
Coremark/core: 5560.75
Atom N2800 @ 1.86 GHz (2 Atom cores)
Coremark: 12286.90
Coremark/Core: 6143.45
Atom advantage: 10%
### Note:
Why the two A9 and Atom scores are so much different (see Tegra2 vs Exynos and Atom Z530 vs N2800)? The reason lie in the compiler: recent GCC version have greatly improved their efficienty with in-order uarch. Moreover, please also note that the high A9 score (Exynos) was obtained with their specific arm compiler. I am sure that, if benchmarked using Intel C Compiler, the Atom score would be higher.
### Summary:
the Atom core is more than capable to compete against A9. You can argue than Atom has an higher clock, but in phone/tablet environmento clocks don't mean nothing. What is important is performance/watt.
This bring us to the two real Atom's problem:
1) a very low efficiency chipset and low integration. Moorestown (intel first attempt to mobile with Atom) was doomed from the start because it require 4/5 chips to enable a full-featured phone;
2) a very slow GPU (with very bad performance/watt).
Moreover, it is widely understand that A9 OoO engine is a mild implementation only. A15 is much stronger in this reguard, sometime (not too often, anyway) even apporaching AMD Bobcat single-thread performance.
Regards.
Wilco1 - Saturday, June 22, 2013 - link
No - the performance comparisons that are useful are:1. Max score for a SoC - despite running at a far lower clock, in both comparisons A9-based SoCs win by more than 80% in overall performance.
2. Efficiency of a core at the same frequency (IPC) - Without Hyperthreading A9 is 74% faster, with Hyperthreading A9 wins by more than 20%.
Note that your comparison doesn't work. You can't come to a conclusion about A9 vs Atom performance when you compare with wildly different frequencies. Also it means giving Atom the advantage of having 2 threads vs 1 on A9. So to make the comparison fair you need to compare with an equal number of threads or at the same clock.
Yes GCC has improved a lot in recent years, on ARM it has become a reasonable compiler and competitive with ARM's armcc compiler. I don't know how much better ICC would be on Atom, but I suspect the gap is far smaller as well.
A9 is not hugely OoO indeed, just like Silvermont. A15 is aggressive OoO and beats Jaguar.
shodanshok - Saturday, June 22, 2013 - link
No, I don't agree again.You explicitly talket about CortexA9 and Atom uarch, _not_ their SoC implementation.
You can not use the total SoC score as uarch benchmark - simply because it don't rule out differences in cores number. To measure uarch performances you need to do a core-by-core comparison. Let me do an example: using total SoC score, a 4xA9 SoC is faster then 2xA15 one. However, the latter uarch is considerably more advanced.
A very similar argument can be done for frequency: Atom was _from the start_ designed to hit a relatively high-clock, yet low power target. This was deliberately done to exploit Intel 45/32nm HKMG process, which don't scale power down much for lower frequency target. It is simply a question of design targets: for low power chips, you can get (relatively) high-freq _or_ (relatively) high IPC - not both (actually).
So, you must decide: are you comparing uarch of final SoC implementation? Because, from an uarch point, Atom win. From a performance/watt metric, their bare cores tend to be on par. From a final product specification, A9 is way better because there are many high-integrated, low power, low cost SoCs from a multitude of vendors. On contrast, Atom-based SoCs are offered only by Intel and with a much lower integration factor (and higher cost) - until now,where they latest platform begin to be very competitive against older A9 SoC.
The "little problem" is that ARM is shipping with 2x and 4x A15 cores, and against them Atom is a disvantage.
Regards.
Wilco1 - Saturday, June 22, 2013 - link
While Atom was indeed designed for high frequency, A9 reaches higher frequencies: Atom maxes out at 2GHz on 32nm, while A9 does 1.7GHz on 40nm and 2.3GHz on 28nm. So you can't claim a "microarchitecture" win for Atom when you compare against a low clocked A9.Secondly, since you argue that frequency is an important aspect of the microarchitecture, I would argue that core count matters equally. A9 was designed to be simple and small, so it is typically used as a quad-core. On the other hand Atom is a large and complex core which uses Hyperthreading rather than multiple cores. So if you want to do a fair comparison with Hyperthreading enabled then you have to use 2 A9 cores for every Atom core. That's how they have been designed to be used.
What is the difference between a module, a HT enabled core and a dual core? These are just different ways of improving multithreaded performance with different hardware tradeoffs - but to software they all appear identical.
In conclusion: you cannot just pick whatever comparison you want. Either you compare the whole SoC, including its frequency as well as core count, or you compare microarchitectures normalized on core count and frequency. You can't include one but not the other as frequency, core count and TDP are related.
shodanshok - Sunday, June 23, 2013 - link
So, you started about in-order vs OoO and now you are speaking of die size and perm/mm2?1) While CortexA9 was rated for 2 GHz operation, a single A9 core would dissipate more than 2 Watt at this frequency. Atom is not so much different in this reguard. Moreover, can you point me a phone that use a 2 GHz A9 implementation? I bet no.
2) Atom is also MP form the start: it has the same bus unit and MP capability of Netburst uarch. By which metrics these are inferior to the ARM MP implementation?
3) By die size comparison, A9 is clearly better then Atom. However, its performance are lower.
4) HT is simply a smart sharing of some key structure in order to interleave two thread on the same core. You can not count HT as another core. For example, barrel microprocessors can interleave many threads on a single core: Sun T1 can inteleave 4x threads per core, T2 8x core. Do you count T1 as having 32 cores? If so, you are wrong.
Both I and other users pointed you many reviews and benchmarks where Atom is clearly identified as faster then A9. However, you contine to change metrics.
The only benchmark that paint a different picture is Geekbench, which show A9 in the same league as Sandy Bridge. Do you _really_ think this is true? In SPEC benchmarks, SB is quite close to the big, power hungry but powerfull POWER7. Do you really think that A9 is remotely comparable to this core? Really?
I already stated this: if you compare SoCs, well, A9 wins, because there are many well done SoCs based around it. However, from uarch/performance side, Atom wins.
The funny thing is that is now totally irrelevant: A9 is superseeded by A15, and Atom is very near its EOL. Moreover, Jaguar seems to be a very competent table chip.
Regards.
MrPhilo - Sunday, June 23, 2013 - link
Unfair to compare the A9's to Atom. The Tegra 2 was a old revision of A9 while lacking NEON etc. The newer A9 are more fair to compare. Also a single A9 at 2Ghz wont produce 2 watts at all, the 2.3Ghz Tegra 4i would be worse than the A15 if it did. Remember the nm is 28 not the old 40's.