The Intel Xeon E7-8800 v3 Review: The POWER8 Killer?
by Johan De Gelas on May 8, 2015 8:00 AM EST- Posted in
- CPUs
- IT Computing
- Intel
- Xeon
- Haswell
- Enterprise
- server
- Enterprise CPUs
- POWER
- POWER8
The story behind the high-end Xeon E7 has been an uninterrupted triumphal march for the past 5 years: Intel's most expensive Xeon beats Oracle servers - which cost a magnitude more - silly, and offers much better performance per watt/dollar than the massive IBM POWER servers. Each time a new generation of quad/octal socket Xeons is born, Intel increases the core count, RAS features, and performance per core while charging more for the top SKUs. Each time that price increases is justified, as the total cost of a similar RISC server is a factor more than an Xeon E7 server. From the Intel side, this new generation based upon the Haswell core is no different: more cores (18 vs 15), better RAS, slightly more performance per core and ... higher prices.
However, before you close this tab of your browser, know that even this high-end market is getting (more) exciting. Yes, Intel is correct in that the market momentum is still very much in favor of themselves and thus x86.
No less than 98% of the server shipments have been "Intel inside". No less than 92-94% of the four socket and higher servers contain Intel Xeons. From the revenue side, the RISC based systems are still good for slightly less than 20% of the $49 Billion (per year) server market*. Oracle still commands about 4% (+/- $2 Billion), but has been in a steady decline. IBM's POWER based servers are good for about 12-15% (including mainframes) or $6-7 Billion depending on who you ask (*).
It is however not game over (yet?) for IBM. The big news of the past months is that IBM has sold its x86 server division to Lenovo. As a result, Big Blue finally throw its enormous weight behind the homegrown POWER chips. Instead of a confusing and half heartly "we will sell you x86 and Itanium too" message, we now get the "time to switch over to OpenPOWER" message. IBM spent $1 billion to encourage ISVs to port x86-linux applications to the Power Linux platform. IBM also opened up its hardware: since late 2013, the OpenPower Foundation has been growing quickly with Wistron (ODM), Tyan and Google building hardware on top of the Power chips. The OpenPOWER Foundation now has 113 members, and lots of OpenPower servers are being designed and build. Timothy Green of the Motley fool believes OpenPower will threaten Intel's server hegemony in the largest server market, China.
But enough of that. This is Anandtech, and here we quantify claims instead of just rambling about changing markets. What has Intel cooked up and how does it stack up to the competion? Let's find out.
(*) Source: IDC Worldwide Quarterly Server Tracker, 2014Q1, May 2014, Vendor Revenue Share
146 Comments
View All Comments
Brutalizer - Tuesday, May 26, 2015 - link
@FUDer KevinGME: And how do you know that x86 requires fewer sockets than SPARC?
YOU: "...Even with diminishing returns, there are still returns. In other benchmarks regarding 16 socket x86 systems, performance isn’t double that of an 8 socket system but it still a very respectable gain. Going to 32 sockets with the recent Xeon E7v3 chips should be able to capture the top spot...."
I must again remind myself that "there are no stupid people, only uninformed people". Your ignorance make it hard to have a discussion with you, because there are so much stuff you have no clue of, you lack basic math knowledge, your logic is totally wrong, your comp sci knowledge is abysmal, and still you make up lot of stuff without backing things up. How do you explain to a fourth grader that his understanding of complexity theory is wrong, when he lack basic knowledge? You explain again and again, but he does not get it. How could he get it???
Look, let me teach you. Benchmarks between a small number of sockets are not conclusive when you go to a higher number of sockets. It is easy to get good scaling from 1 to 2 sockets, but to go from 16 to 32 is another thing. Everything is different, locking is much worse because race conditions are more frequent, etc etc. Heck, even you write that scaling is difficult as you go to a high number of sockets, and still you claim that x86 would scale much better than SPARC, you claim x86 scales close to linear? On what grounds?
Your grounds are that you have seen OTHER benchmarks. And what other benchmarks do you refer to? Did you look at scale-out benchmarks? Has it occured to you that scale-out benchmarks can not be compared to scale-up benchmarks? So, what other benchmarks do you refer to, show us the links where 16-socket x86 systems get good scaling from 8-sockets. I would not be surprised if it were Java SPECjbb2005, LINPACK or SPECint2006 or some other clustered benchmark you refer to. That would be: not clever if you looked at clustered benchmarks and drew conclusions about scale-up benchmarks. I hope even you understand what is wrong with your argument? Scale-out clustered benchmarks always scale well because workload is distributed, so you can get good scaling. But scale-up SAP scaling is another thing.
Show us the business enterprise scale-up benchmark, where they go from 8-socket x86 server up to 16-socket and get good scaling. This is going to be fun; I am going to slaughter all your scale-out 16-socket x86 benchmarks (you will only find scale-out clustered benchmarks which makes you laughable as you compare with scale-up :). Your "analysis" is a bit... non rigorous. :-)
.
"...How about a change of pace and you start backing up your claims?..."
What claims should I backup? This whole thread started by me, claiming that it is impossible to get high business enterprise SAP scores on x86, because x86 scale-up servers stop at 8-sockets and scale-out servers such as SGI UV2000 can not handle scale-up workloads. You claim this is wrong, you claim that SGI UV2000 can handle business enterprise workloads such as SAP. To this I have asked you to post only one single SAP link where a x86 server can compete with the top Unix servers. You have never posted such a link. At the top are only Unix servers, at the very bottom are x86, far away. The performance difference is huge.
Instead you claim that x86 320.000 saps can do compete with SPARC 850.000 saps - well the x86 gets 37% of the SPARC score. Why do you claim it? Because the x86 score is in the top 10 list!!! That is so wrong logically. There are no other good performing servers than SPARC, it is alone far ahead at the top. The competition is left far behind, because they scale so bad. To this you say; well, x86 is among the rest in the bottom, so therefore it is competetive with SPARC.
There are no other good business SAP servers today than SPARC and POWER8. Itanium is dead. Your only choice to get extremly high SAP scores is go to SPARC. x86 can not do that. But you claim that UV2000 can do that. Well, in that case you should show us links where UV2000 does that, back up your claims and stop FUDing. Otherwise you make up things, and call them facts, when they are made up. Made up false claims that SGI UV2000 can achieve extreme SAP scores is called "negative or dubious information designed to spread confusion" - in other words: FUD.
You are doing the very definition of FUD. You are claiming that UV2000 can do things, it can not. And you can not prove it. In other words, everything is made up. And that, is FUD. So, show us links, or admit you are FUDing.
How about this scenario: "Did you know that SPARC M6 with 32-sockets can outclass SGI UV2000 with 256-sockets on HPC computations? Yes it can, SPARC is several times faster than x86! I am not going to show you benchmarks on this, you have to trust me when I say that SPARC M6 is much faster than UV2000." - is this FUD or what???
Or this scenario: "Did you know that SGI UV2000 is quite unstable and crashes all the time? And no, I am not going to post links proving this, you have to trust me when I say this" - is this FUD or what???
How about this familiar KevinG scenario: "Did you know that SGI UV2000 is faster at SAP than high end Unix servers? No, I am not going to show you benchmarks, you have to trust me on this" - is this FUD or what???
Hey FUDer, can you back up your claims? Show us a SAP ranked benchmark with a x86 server. I dont know how many times I need to ask you to do this? Is this... the tenth time? Or 15th? Google a bit more, and hope you will find someone that uses SGI or ScaleMP for SAP benchmarks - but you will not find any because it is impossible. :)
.
"....Unix systems do offer proprietary libraries and features that Linux does not offer. If a developer’s code sticks to POSIX and ANSI C, then it is very portable but the more you dig into the unique features of a particular flavor of Unix, the harder it is to switch to another. Certainly there is overlap between Unix and Linux but there is plenty unique to each OS...."
Sigh. So much... ignorance. Large enterprise systems such as SAP, Oracle database, etc - are written to be portable between different architectures and OSes; Linux, Solaris, IBM AIX, HP-UX, etc. So you are wrong again: the reason companies continue to shell out $millions for Unix servers is not because of vendor lockin. And it is not because of RAS. Your explanations are all flawed.
So, answer me again: why has not the high end Unix market died in an instant if x86 scale-out servers such as SGI UV2000 can replace Unix servers at SAP, Databases, etc? Why are you ducking this question?
.
ME: “...I also showed you that the same 3.7GHz SPARC cpu on the same server, achieves higher saps with 32-sockets, and achieves lower saps with 40-sockets....”
YOU: "...No, you didn’t even notice the clock speed differences and thought that they were the same as per your flawed analysis on the matter. It is a pretty big factor for performance and you seemingly have overlooked it. Thus your entire analysis on how those SPARC systems scaled with additional sockets is inherently flawed...."
Wrong again. I quote myself: "I also showed you that the same 3.7GHz SPARC cpu on the same server, achieves higher saps with 32-sockets, and achieves lower saps with 40-sockets."
And here is a quote from an earlier post where I do exactly this: compare the same 3.7GHz cpu: "....Ok, I understand what you are doing. For instance, the 16-socket M10-4S has 28.000 saps per socket, and the 32-socket M10-4S has 26.000 saps per socket - with exactly the same 3.7GHz cpu. Sure, the 32 socket used a newer and faster version of the database - and still gained less for every socket as it had more sockets...."
And also, earlier, I noticed the clock speed differences, but it was exactly the same cpu model. I thought you would accept an extrapolation. Which you did not. So I compare same cpu model and same 3.7GHz clock speed, and I show that 40-socket gains less saps than 32-socket do - and this is an "inherently flawed" comparison? Why?
Do you think I should accept your "analysis" where you look at scale-out benchmarks to conclude x86 scale-up scalability? Do you see the big glaring error in your "analysis"?
.
ME:“So, why do believe that 32-socket x86 would easily be faster than Unix servers?”
YOU: "That is not a citation. Another failed claim on your part."
Explain again what makes you believe a 32-socket x86 server would scale better than Unix servers. Is it because you looked at scale-out clustered x86 benchmarks and concluded about x86 scalability for SAP? And therefore you believe x86 would scale much better?
.
ME:“Oh, so the UV300 is the same as a smaller UV2000? Do you have links confirming this or did you just made it up?”
YOU: "No, in fact I pointed out that the UV 300 uses NUMALink7 interconnect where as the UV 2000 uses NUMALink6..."
Can you quote the links and the where SGI say that UV300 is just a 16-socket UV2000 server?
.
ME:"SGI does not say UV2000 is a scale up server.”
YOU: "You apparently did not watch this where they explain it is a giant SMP machine that goes up to 256 sockets:"
But still UV2000 is exclusively viewed as a scale-out HPC server by SGI (see my quote from your own link below where SGI talks about getting into the enterprise market with HPC servers), and never used for scale-up workloads. So what does it matter what SGI marketing label the server? Can you show us one single customer that use UV2000 as a scale-up server? Nope. Why?
.
ME: “It is totally irrelevant if it runs a single instance of Linux. As I have explained earlier, ScaleMP also runs a single instance of Linux.”
YOU:"...Except that ScaleMP is not used by the UV2000 so discussing it here is irrelevant.."
It is relevant. You claim that because UV2000 runs a single image kernel with shared memory UV2000 is a scale-up server. That criteria is wrong as I explained. I can explain again: ScaleMP is a true clustered scale-out server which they themselves explain, and ScaleMP runs a single image kernel and shared memory. Hence, your argument is wrong when you claim that UV2000 is a scale-up server because it runs a single image kernel. ScaleMP which is a true scale-out server, does the same. So your argument is invalid, by this explanation which is very relevant.
.
YOU:"...So what specifically in the design of the UV 2000 makes it scale out as you claim? Just because you say so doesn’t make it true. Give me something about its actual design that counters SGI own documentation about it being one large SMP device. Actually back up this claim...."
All customers are using SGI UV2000 for scale-out HPC computations. No one has ever used it for scale-up workloads. Not a single customer. There are no examples, no records, no links, no scale up benchmarks, no nothing. No one use UV2000 for scale up workloads such as big databases - why??
http://www.zdnet.com/article/scale-up-and-scale-ou...
"...Databases are a good example of the kind of application that people run on big SMP boxes because of cache coherency and other issues..."
Sure, let SGI marketing call UV2000 a SMP server, they can say it is a carrot if they like - but still no one is going to eat it, nor use it to run scale-up business workloads. There are no records, nowhere. Show us one single link where customers use UV2000 for enterprise workloads. OTOH, can you show us links where customers use UV2000 for scale-out HPC computations? Yes you can - the internet is swarming with such links! Where are the scale-up links? Nowhere. Why?
See links immediately below here:
YOU:"...Also you haven’t posted a link quoting SGI that the UV 2000 is only good for HPC. In fact, the link you continually post about this is a decade old and in the context of older cluster offers SGI promoted, long before SGI introduced the UV 2000...."
Que? I have posted several links on this! For instance, here is another, where I quote from your own link:
http://www.enterprisetech.com/2014/03/12/sgi-revea...
"...What I am trying to understand is how you, SGI, is going to be deploying technologies that it has developed for supercomputing in the business environment. I know about the deal you have done with SAP on a future HANA system, but this question goes beyond in-memory databases. I just want to get my brain wrapped around the shape of the high-end enterprise market you are chasing..."
"...Obviously, Linux can span an entire UV 2000 system because it does so for HPC workloads, but I am not sure how far a commercial database made for Linux can span....
"...So in a [SGI UV2000] system that we, SGI, build, we can build it for high-performance computing or data-intensive computing. They are basically the same structure at a baseline..."
"...IBM and Oracle have become solution stack players and SAP doesn’t have a high-end vendor to compete with those two. That’s where we, SGI, see ourselves getting traction with HPC servers_ into this enterprise space...."
"...The goal with NUMAlink 7 is...reducing latency for remote memory. Even with coherent shared memory, it is still NUMA and it still takes a bit more time to access the remote memory..."
In all other links I have posted, SGI says the same thing as here "SGI is not getting into the enterprise market segment yet", etc. In this link SGI says UV2000 systems are for computing, not for business enterprise. SGI says they are getting traction with HPC servers, into enterprise. SGI talks about difficulties to get into the enterprise space with HPC servers. They do not mention any scale-up servers ready to get into enterprise.
So here you have it again. SGI exclusively talks about getting HPC computation servers into enterprise. Read your link again and you will see that SGI only talks about HPC servers.
.
"...SAP doesn’t actually say that 32 sockets is a hard limit. Rather it is the most number of sockets for a system that will be validated (and it is SAP here that is expecting the 32 socket UV 300 to be validated). Please in the link below quote where SAP say that HANA strictly cannot go past 32 sockets..."
In the link they only talk about 32-sockets, they explicitly mention 32-socket SGI UV3000H and dont mention UV2000 with 256 sockets. They say that bigger scale-up servers than 32-sockets will come later. I quote from the link:
"....The answer for the SAP Business Suite is simple right now: you have to scale-up. This advice might change in future, but even an 8-socket 6TB system will fit 95% of SAP customers, and the biggest Business Suite installations in the world can fit in a SGI 32-socket with 24TB..."
"...HP ConvergedSystem 900... is available with up to 16 sockets and 4TB for analytics, or 12TB for Business Suite. The HP CS900 uses their Superdome 2 architecture
SGI have their SGI UV300H appliance... 32 sockets and 8TB for analytics, or 24TB for Business Suite.
Bear in mind that bigger scale-up systems will come, as newer generations of Intel CPUs come around. The refresh cycle is roughly every 3-4 years, with the last refresh happening in 2013...."
.
ME:“No, this is wrong again. I wanted you to post x86 sap benchmarks with high scores that could compete at the very top.”
YOU: "And I did, a score in the top 10. I did not misquote you and I provided exactly what you initially asked for. Now you continue to attempt to shift the goal posts."
No, not a score in top 10. Why do you believe I mean top 10? I meant at the very top. I want you to show links where x86 beat the Unix servers in SAP benchmarks. Go ahead. This whole thread started by me posting that x86 can never challenge high end Unix servers in SAP, that you need to go to Unix if you want the best performance. Scale-up x86 wont do, and scale-out x86 wont do. I quote myself:
"...So, if we talk about serious business workloads, x86 will not do, because they stop at 8 sockets. Just check the SAP benchmark top - they are all more than 16 sockets, Ie Unix servers. X86 are for low end and can never compete with Unix such as SPARC, POWER etc. scalability is the big problem and x86 has never got passed 8 sockets. Check SAP benchmark list yourselves, all x86 are 8 sockets, there are no larger...."
So, now I ask you again: show us a x86 sap benchmark that can compete at the very top. Not at the very bottom with 37% of the top performance - that is laughable.
QUESTION_H) Can x86 reach close to a million saps at all? Is it even possible with any x86 server? Answer this question with links to benchmarks. And no "yes they can, trust me on this, I am not going to prove this" - doesnt count as it is pure FUD. So answer this question.
.
ME:"It says, just like SAP says, that UV2000 is not suitable for SAP workloads. How can you keep on claiming that UV2000 is suitable for SAP workloads, when SGI and SAP says the opposite?”
YOU:"That quote indicates that it does scale further. The difficulty being mentioned is a point that been brought up before: the additional latencies due to more interconnections as the system scales up."
The question is not if you can scale further, the question is if it scale good enough for actual use. And your link says that UV2000 has scaling issues for enterprise usage, and it is a HPC server. Here you have it again, how many links on this do you want?
...This UV 300 does not have some of the issues that SGI’s larger-scale NUMA UV 2000 machines have, which make them difficult to program even if they do scale further....This is not the first time that SGI, in several of its incarnations, has tried to move from HPC into the enterprise space....
.
"...Except you are not answering the question I asked. Simply put, what actually would make the 96 Bixby socket version a cluster where as the 32 socket version is not?..."
I answered and said that both of them are NUMA servers. I think I need to explain to you more, as you dont know so much about these things. Larger servers are all NUMA (i.e. tightly coupled cluster), meaning they have bad latency to far away nodes. Latency differs from close nodes and far away nodes - i.e NUMA. True SMP servers have the same latency no matter which cpu you reach. If you can keep the NUMA server small and with good engineering, you can still get a decent latency making it suitable for scale-up enterprise business workloads where the code branches heavily. As SGI explained in an earlier link, enterprise workloads branch heavily in the source code making that type of code less suitable for scale-out servers - this is common knowledge. I know you have read this link earlier.
Obviously 32-socket SPARC Bixby has low enough latency to be good for enterprise business usage, as Oracle dabbles in that market segment. But as we have not seen 96-socket bixby servers yet, I suspect that latency to far away cpus differ too much, making performance less than optimal. Otherwise Oracle would have sold 96-socket servers, if performance would be good enough. But they dont.
.
ME: “As I explained, no one puts larger servers than 8-sockets in production, because they scale too bad (where are all benchmarks???). I also talked about business enterprise workloads.”
YOU: "And this does not explain the contradiction in your statements. Rather this is more goal post shifting."
It is not goal shifting. All the time we have talked about large scale-up servers for enterprise usage, and if I forget to explicitly use all those words in every sentence, you call it "goal shifting". It is you that pretend to not understand. If I say "x86 can not cope with large server workloads" you will immediately talk about HPC computations, ignoring the fact that this whole thread is about enterprise workloads.
You on the other hand, is deliberately spreading a lot of negative or false disinformation - i.e. FUD. You know that no one has ever used UV2000 for sap usage, there are no records on the whole internet - but still you write it. That is actually false information.
.
ME: "And where does it say that Cerner used this 16-socket x86 server to replace a high end Unix server? Is it your own conclusion, or do you have links?”
YOU: "Yes as I’ve stated, it wouldn’t make sense to use anything less than the 16 socket version. They could have purchased an 8 socket server from various other vendors like IBM/Lenovo or HP’s own DL980. Regardless of socket count, it is replacing a Unix system as HPUX is being phased out. If you doubt it, I’d just email the gentlemen that HP quoted and ask."
So, basically you are claiming that your own "conclusions" are facts? And if I doubt your conclusions, I should find it out myself by trying to get NDA closed information from some guy at large large HP? Are you serious?
Wow. Now we see this again "trust me on this, I will not prove it to you. If you want to find out, you can find it out yourself. I dont know how you should find this guy, but it is your task to prove my made up claim". Wow, you are really good at FUDing. So, I guess this "fact" is also something you are not going to back up? Just store it among the rest of the "facts" that never gets proven? Lot of FUD here...
.
"....Congratulations on actually reading my links! This must be the first time. However, it matters not as the actual point has been lost entirely again. There indeed has to be a mechanism in place for concurrency but I’ll repeat that my point is that it does not have to be a lock as there are alternatives. Even after reading my links on the subject, you simply just don’t get it that there are alternatives to locking...."
Of course I read it because I know comp sci very well, and I _know_ your claim is impossible, it was just a matter of finding the text to quote. I knew the text would be there, so I just had to find it.
Can you explain again why there are alternatives to locking? You linked to three "non-locking" methods - but I showed that they all do have some kind of lock deep down, they must have some mechanism to synch other threads so they dont simultaneously overwrite data. If you want to guarantee data integrity you MUST have some kind of way of stopping others to write data. So if you claim it is possible to not do this, it is revoultionzing and I think you should inform the whole comp sci community. Which you obviously are not a part of.
.
"...Apparently it does work and it is used in production enterprise databases as I’ve given examples of it with appropriate links. If you claim other wise, how about a formal proof as to why it cannot as you claim? How about backing one of your claims up for once?..."
I backed up my claims by quoting text from your own links, that they all have some kind of locking mechanism deep down. Ive also explained why it can not be done. If several threads write the same data, you need some way of synching the writing, to stop others to overwrite. It can not be done in another way. This is common sense, and in parallel computations you talk about race conditions, mutex, etc etc. It would be a great break through if you could write same data simultaneously and at the same time guarantee data integrity. But that can not be done. This is common sense.
.
"....Actually I’ve been trying to get you to realize that the UV 2000 is a scale up SMP system...."
It is very easy to make me realize UV2000 is a scale up system - just prove it. I am a mathematician and if I see a proof, I immediately change my mind. Why would I not? If I believe something false, I must change my mind. So, show us some links proving that UV2000 are used for enterprise business workloads such as SAP or databases, etc. If you can show such links, I will immediately say I was wrong and that UV2000 is a good all round server suitable for enterprise usage as well. But, the thing is - there are no such links! No one does it! What does it say you?
Unless you can prove it, you can not change my mind. It would be impossible.
Look: "SPARC M6 32-sockets are much faster than UV2000 with 256-sockets on HPC computations. I am not going to prove it, but Oracle marketing says SPARC is the fastest cpu in the world, so it must be true. I just want to make you realize that SPARC M6 is much faster than UV2000. But I will not prove it by links nor benchmarks".
Would you change your mind on this claim? Would you believe SPARC M6 is much faster than UV2000? Just because I say so? Nope you would not. But, if I could show you benchmarks where SPARC M6 was in fact, much faster than UV2000? Would you change your mind then? Yes you would!
In effect you are FUDing. And unless you post links and prove your claims, I am not going to change my mind. I hope you realize that. The only way to convince a mathematician is to prove it. Show us links and benchmarks. Credible I must add. One blog where some random guy writes something does not count. Show us official and validated benchmarks.
.
"...Just like you’ve been missing my main point, you have overlook where the institute was using it was a scale up system due the need for a large shared memory space for their workloads. It was replaced by a UV 2000 due to its large memory capacity as a scale up server. Again, this answer fit your requirements of “Show us one single customer that has replaced a single Unix high end server with a SGI scale-out server, such as UV2000 or Altix or ScaleMP or whatever.” and again you are shifting the goal posts...."
Jesus. It is YOU that constantly shifts the goal posts. You KNOW that this whole discussion is about x86 and scale up business enterprise workloads. Nothing else. And if I dont specify "business enterprise workloads" in every sentence, you immediately jumps on that and shift to talking about HPC calculations or whatever I did not specify. You KNOW we talk only about scale-up workloads. Math institutes doing computations is NOT business enterprise, it is all about HPC. You know that. And because I thought you were clever enough to know we both talked about enterprise business workloads, I did not specify that in every sentence - and immediately you shifted goal posts at once, taking the chance to talk about math institutes doing HPC calculations. And at the same time accuse ME for shifting goal posts??? Wtf??? Impertinent indeed.
So, obviously you shift goal posts and you FUD. A lot. What will you try next? How about the truth? Show us links where one single customer that replaced Unix high end servers with a large scale-out server on BUSINESS ENTERPRISE WORKLOADS SUCH AS SAP OR DATABASES??? (I did not forget to specify this time)
Test1
[bold]Test2[/bold]
Kevin G - Wednesday, May 27, 2015 - link
@Brutalizer“Look, let me teach you. Benchmarks between a small number of sockets are not conclusive when you go to a higher number of sockets. It is easy to get good scaling from 1 to 2 sockets, but to go from 16 to 32 is another thing. “
Please quote me where I explicitly claim otherwise. I have stated that scaling is non-linear as socket count increases. We’re actually in agreement on this point but it continues to be something you insist otherwise. Also if you feel the need to actually demonstrate this idea again, be more careful as your last attempt had some serious issues.
“Your grounds are that you have seen OTHER benchmarks. And what other benchmarks do you refer to? Did you look at scale-out benchmarks? Has it occured to you that scale-out benchmarks can not be compared to scale-up benchmarks? So, what other benchmarks do you refer to, show us the links where 16-socket x86 systems get good scaling from 8-sockets. I would not be surprised if it were Java SPECjbb2005, LINPACK or SPECint2006 or some other clustered benchmark you refer to. ”
Those are perfectly valid benchmarks as well to determine scaling. Remember, a scale up system can still run scale out software as a single node just fine. A basic principle still has to be maintained to isolate scaling: similar system specifications with just varying socket count to scale up. For example, SPECint2006 can be run on an SPARC M10 with 4 socket as well as 8 socket to 16 sockets etc. It’d just be a generic test of integer performance as additional sockets are added which can be used to determine how well the system scales with that workload. Also due to the overhead of adding another socket, performance scaling will be less than linear.
While you can say that SPECint2006 is not a business workload, which is correct, you cannot deny its utility to determine system scaling. The result of SPECint2006 scaling as an optimistic case for as you claim would then serve as an upper bound for other benchmarks (i.e. system scaling cannot beyond this factor). It can also be used to indicate where diminishing returns, if any, can be found as socket count goes up. If diminishing returns are severe with an optimistic scaling benchmark, then they should appear sooner with a more rigorous test. This would put an upper limit to how many additional sockets would be worthwhile to include in a system.
“What claims should I backup?”
How about that the UV 2000 is a cluster. You have yet to demonstrate that point while I’ve been able to provide evidence that it is a scale up server.
“This whole thread started by me, claiming that it is impossible to get high business enterprise SAP scores on x86”
Incorrect. A top 10 score using only eight sockets on an x86 system for SAP has been validated. Apparently the impossible has been done.
“Sigh. So much... ignorance. Large enterprise systems such as SAP, Oracle database, etc “
The context was with regards to custom applications that companies themselves would write. The portability of the code was the businesses to determine, not a 3rd party vendor. Legacy support and unique features to Unix are just some of the reasons why people will continue to use those system even in the face of faster hardware. Hence another point you don’t understand.
Even with your context of 3rd party vendors, businesses fail or they’re bought out by another company where products only move to legacy and are no longer updated. Not all 3rd party software gets ported between the all the various flavors of Unix and Linux. Case in point: HPUX is tied to Itanium and thus a dead platform. Thus any HPUX exclusive 3rd party software is effectively dead as well.
“Wrong again. I quote myself: "I also showed you that the same 3.7GHz SPARC cpu on the same server, achieves higher saps with 32-sockets, and achieves lower saps with 40-sockets."
And now you are even missing the points that you yourself were trying to make which how scaling from 32 sockets to 40 sockets was poor when in fact that comparison was invalid due to the differences in clock speed.
“And also, earlier, I noticed the clock speed differences, but it was exactly the same cpu model. I thought you would accept an extrapolation.”
I rejected it appropriately as you never pointed out the clock speed differences in your analysis, hence your conclusions were flawed. Also I think it is fair to reject extrapolation as you’ve also rejected my extrapolations elsewhere even as indicate as such. Fair game.
“Explain again what makes you believe a 32-socket x86 server would scale better than Unix servers. Is it because you looked at scale-out clustered x86 benchmarks and concluded about x86 scalability for SAP?”
Again, this is not a citation.
“Can you quote the links and the where SGI say that UV300 is just a 16-socket UV2000 server?”
Or you know you could have just read what I stated and realize that I’m saying that the UV300 is not a scaled down UV2000. (The UV 300 is a scaled down version of the UV 3000 that is coming later this year to replace the UV 2000.) Rather a 16 socket UV 2000 would have same attribute of having a uniform latency as all the sockets would be the same distance from each other in terms of latency. Again, this is yet another point you’ve missed.
“But still UV2000 is exclusively viewed as a scale-out HPC server by SGI (see my quote from your own link below where SGI talks about getting into the enterprise market with HPC servers), and never used for scale-up workloads. So what does it matter what SGI marketing label the server? “
No, SGI states that it is a scale up server plus provides the technical documentation to backup that claim. The idea that they’re trying to get into the enterprise market should be further confirmation that it is a scale-up servers that can run business workloads. Do you actually read what you’re quoting?
“Can you show us one single customer that use UV2000 as a scale-up server?”
I have before: the US Post Office.
“All customers are using SGI UV2000 for scale-out HPC computations. No one has ever used it for scale-up workloads. Not a single customer. There are no examples, no records, no links, no scale up benchmarks, no nothing. No one use UV2000 for scale up workloads such as big databases - why??”
I’ve given you a big example before: the US Post Office. Sticking your head in the sand is not a technical reason for the UV 2000 being a cluster as you claim. Seriously, back up your claim that the UV 2000 is a cluster.
“Sure, let SGI marketing call UV2000 a SMP server, they can say it is a carrot if they like - but still no one is going to eat it, nor use it to run scale-up business workloads.”
Or you could read the technical data on the UV 2000 and realize that it has a shared memory architecture with cache coherency, two attributes that define a modern scale up SMP system. And again, I’ll reiterate that the US Post Office is indeed using these systems for scale-up business workloads.
“Que? I have posted several links on this!”
Really? The only one I’ve seen from you on this matter is a decade old and not in the context of SGI’s modern line up. The rest are just links I’ve presented that you quote out of context or you just do not understand what is being discussed.
“For instance, here is another, where I quote from your own link:”
Excellent! You’re finally able to accept that these systems can be used for databases and business workloads as the quote indicates that is what SGI is doing. Otherwise I find it rather strange that you’d quote things that run counter to your claims.
ARTICLE: "...Obviously, Linux can span an entire UV 2000 system because it does so for HPC workloads, but I am not sure how far a commercial database made for Linux can span....”
Ah! This actually interesting as it is in the context of the maximum number of threads a database can actually use. For example, MS SQL Server prior to 2014 could only scale to a maximum of 80 concurrent threads per database. Thus for previous versions of MS SQL Server, any core count past 80 would simply go to waste due to software limitations. As such, there may be similar limitations in other commercial databases that would be exposed on the UV 2000 that wouldn’t apply else where. Thus the scaling limitation being discussed here is with the database software, not the hardware so you missed the point of this discussion.
ARTICLE:"...So in a [SGI UV2000] system that we, SGI, build, we can build it for high-performance computing or data-intensive computing. They are basically the same structure at a baseline..."
And you missed the part of the quote indicating ‘or data-intensive computing’ which is a key part of the point being quoted that you’ve missed. Please actually read what you are posting please.
ARTICLE: "...IBM and Oracle have become solution stack players and SAP doesn’t have a high-end vendor to compete with those two. That’s where we, SGI, see ourselves getting traction with HPC servers into this enterprise space...."
This would indicate that the UV 2000 and UV 300 are suitable for enterprise workloads which runs counter to your various claims here.
ARTICLE: "...The goal with NUMAlink 7 is...reducing latency for remote memory. Even with coherent shared memory, it is still NUMA and it still takes a bit more time to access the remote memory..."
Coherency and shared memory are two trademarks of a large scale-up server. Quoting this actually hurts your arguments about the UV 2000 being a cluster as I presume you’ve also read the parts leading up to this quote and the segment you cut. The idea that accessing remote memory adds additional latency is a point I’ve made else where in our discussion and it is one of the reasons why scaling up is nonlinear. Thus I can only conclude that your quoting of this is to support my argument. Thank you!
“So here you have it again. SGI exclusively talks about getting HPC computation servers into enterprise. Read your link again and you will see that SGI only talks about HPC servers.”
And yet you missed the part where they were talking about those systems being used for enterprise workloads. Again, thank you for agreeing with me!
“In the link they only talk about 32-sockets, they explicitly mention 32-socket SGI UV3000H and dont mention UV2000 with 256 sockets. They say that bigger scale-up servers than 32-sockets will come later.”
Which would ultimately means that 32 sockets is not a hard limit for SAP HANA as you’ve claimed. I’m glad you’ve changed your mind on this point and agree with me on it.
“No, not a score in top 10. Why do you believe I mean top 10? I meant at the very top. I want you to show links where x86 beat the Unix servers in SAP benchmarks. Go ahead. This whole thread started by me posting that x86 can never challenge high end Unix servers in SAP, that you need to go to Unix if you want the best performance. Scale-up x86 wont do, and scale-out x86 wont do.”
Except a score in the top 10 does mean that they are competitive as what you originally asked for. The top 10 score was an 8 socket offering, counter to your claims that all the top scores were this 16 sockets or more. (And it isn’t the only 8 socket system in the top 10 either, IBM has a POWER8 system ranked 7th.)
Also if you really looked, there are 16 socket x86 score from several years ago. At the time of there submission they were rather good but newer systems have displaced them over time. The main reason the x86 market went back to 8 sockets was that Intel reigned in chipset support with the Nehalem generation (the 16 socket x86 systems used 3rd party chipsets to achieve that number). This was pure market segmentation as Intel still had hopes for the Itanium line at the time. Thankfully the last two generations of Itanium chips have used QPI so that the glue logic developed for them can be repurposed for today’s Xeons. This is why we’re seeing x86 systems with more than 8 sockets reappear today.
http://download.sap.com/download.epd?context=40E2D...
http://download.sap.com/download.epd?context=9B280...
“I answered and said that both of them are NUMA servers. I think I need to explain to you more, as you dont know so much about these things. Larger servers are all NUMA (i.e. tightly coupled cluster), meaning they have bad latency to far away nodes. Latency differs from close nodes and far away nodes - i.e NUMA. True SMP servers have the same latency no matter which cpu you reach.”
So by your definition above, all the large 32 socket systems are then clusters because they don’t offer uniform latency. For example, the Fujitsu SPARC M10-4S needs additional interconnect chips to scale past 16 sockets and thus latency on opposite sides of this interconnect are not uniform. IBM’s P795 uses a two tier topology with distinct MCM and remote regions for latency. IBM’s older P595 used two different ring buses for an interconnect where latency even on a single ring was not uniform. A majority of 16 socket systems are also clusters by your definition as there are distinct local and remote latency regions. By your definition, only select 8 sockets systems and most 4 and 2 socket systems are SMP devices as processors at this scale can provide a single directly link between all other sockets.
Or it could be that your definition of what a true SMP server is incorrect as systems like the SPARC M10-4S, IBM P795, IBM P595, SGI UV 300and SGI UV 2000 are all large SMP systems. Rather the defining traits are rather straightforward: a single logical system with shared memory and cache coherency between all cores and sockets. Having equal latency between sockets, while ideal for performance, is not a necessary component of the definition.
“If you can keep the NUMA server small and with good engineering, you can still get a decent latency making it suitable for scale-up enterprise business workloads where the code branches heavily. As SGI explained in an earlier link, enterprise workloads branch heavily in the source code making that type of code less suitable for scale-out servers - this is common knowledge. I know you have read this link earlier.”
Again, define branch heavy in this context. I’ve asked for this before without answer. I believe you mean something else entirely.
“It is not goal shifting. All the time we have talked about large scale-up servers for enterprise usage, and if I forget to explicitly use all those words in every sentence, you call it "goal shifting".”
Since that is pretty much the definition of goal shifting, thank you for admitting to it. In other news, you still have not explained the contradiction in your previous statements.
“So, basically you are claiming that your own "conclusions" are facts? And if I doubt your conclusions, I should find it out myself by trying to get NDA closed information from some guy at large large HP? Are you serious?”
Apparently it isn’t much of a NDA if it is part of a press release. Go ask the actual customer HP quoted as they’re already indicating that they are using a Superdome X system.
“Can you explain again why there are alternatives to locking?”
There are alternative methods for maintaining concurrency that do not use locking. Locking is just one of several techniques for maintaining concurrency. There is no inherent reason to believe that there should only be one solution to provide concurrency.
“ You linked to three "non-locking" methods - but I showed that they all do have some kind of lock deep down, they must have some mechanism to synch other threads so they dont simultaneously overwrite data. If you want to guarantee data integrity you MUST have some kind of way of stopping others to write data.”
You don’t actually demonstrate that locking was used for OCC or MVCC. Rather you’ve argued that since concurrency is maintained, it has to have locking even though you didn’t demonstrate where the locking is used in these techniques. Of course since they functionally replace locking for concurrency control, you won’t find it. Also skip the personal attacks shown where these techniques are used in enterprise production databases.
“I backed up my claims by quoting text from your own links, that they all have some kind of locking mechanism deep down. Ive also explained why it can not be done. If several threads write the same data, you need some way of synching the writing, to stop others to overwrite. It can not be done in another way. This is common sense, and in parallel computations you talk about race conditions, mutex, etc etc. It would be a great break through if you could write same data simultaneously and at the same time guarantee data integrity. But that can not be done. This is common sense.”
This is the problem here: the end goal of concurrency I’m not arguing about. Rather it is how concurrency is obtained that you’re missing the point entirely. There are other ways of doing it than a lock. It can be done and I’ve shown that they’re used in production grade software.
“It is very easy to make me realize UV2000 is a scale up system - just prove it. I am a mathematician and if I see a proof, I immediately change my mind. Why would I not? If I believe something false, I must change my mind. So, show us some links proving that UV2000 are used for enterprise business workloads such as SAP or databases, etc. If you can show such links, I will immediately say I was wrong and that UV2000 is a good all round server suitable for enterprise usage as well. But, the thing is - there are no such links! No one does it! What does it say you?”
Oh I have before, the US Post Office has a UV 2000 for database work. Of course you then move the goal posts to where SAP HANA was no longer a real database.
“Jesus. It is YOU that constantly shifts the goal posts. You KNOW that this whole discussion is about x86 and scale up business enterprise workloads. Nothing else. And if I dont specify "business enterprise workloads" in every sentence, you immediately jumps on that and shift to talking about HPC calculations or whatever I did not specify.”
Again, that is pretty much the definition of shifting the goals post and I thank you again for admitting to it.
“ You KNOW we talk only about scale-up workloads. Math institutes doing computations is NOT business enterprise, it is all about HPC. You know that.”
Actually what I pointed out as a key attribute of those large scale up machines: a single large memory space. That is why the institute purchased the M9000 as well the UV 2000. If they just wanted an HPC system, they’d get a cluster which they did separately alongside each of these units. In other words, they bought *both* a scale up and a scale out system at the same time. In 2009 the scale up server selected was a M9000 and in 2013 their scale up server was a UV 2000. It fits your initial request for UV 2000 replacing a large scale up Unix machine.
Brutalizer - Sunday, May 31, 2015 - link
@FUDer KevinGIve caught a flu, but now I feel better.
.
ME:"It is easy to get good scaling from 1 to 2 sockets, but to go from 16 to 32 is another thing. “
YOU: "Please quote me where I explicitly claim otherwise."
Well, you say that because x86 benchmarks scales well going from 8-sockets to 16 sockets, you expect x86 to scale well for 32-sockets too, on SAP. Does this not mean you expect x86 scales close to linear?
.
ME: "I would not be surprised if it were Java SPECjbb2005, LINPACK or SPECint2006 or some other clustered benchmark you refer to.”
YOU: "Those are perfectly valid benchmarks as well to determine scaling."
Hmmm.... actually, this is really uneducated. Are you trolling or do you really not know the difference? All these clustered benchmarks are designed for clustered scale-out servers. For instance, LINPACK is typically run on supercomputers, big scale-out servers with 100.000s of cpus. There is no way these cluster benchmarks can asses the scalability on SAP and other business workloads on 16- or 32-socket scale-up servers. Another example is SETI@home which can run on millions on cpus, but that does not mean SAP nor databases could also run on millions on cpus. I hope you realize you can not use scale-out benchmarks to draw conclusions for scale-up servers? Basic comp sci knowledge says there is a big difference between scale-up and scale-out. Did you not know, are you just pretending to not know? Trolling? Seriously?
http://en.wikipedia.org/wiki/Embarrassingly_parall...
"...In parallel computing, an embarrassingly parallel workload... is one for which little or no effort is required to separate the problem into a number of parallel tasks..."
BTW, have you heard about P-complete problems? Or NC-complete problems? Do you know something about parallel computations? You are not going to answer this question as well, right?
Where are the benchmarks on x86 servers going from 8-sockets up to 16-sockets, you have used to conclude about x86 scalability? I have asked you about these benchmarks. Can you post them and backup your claims and prove you speak true or is this also more of your lies, i.e. FUD?
http://en.wikipedia.org/wiki/Fear,_uncertainty_and...
"...FUD is generally a strategic attempt to influence perception by disseminating...false information..."
.
ME: “What claims should I backup?”
YOU: "How about that the UV 2000 is a cluster. You have yet to demonstrate that point while I’ve been able to provide evidence that it is a scale up server."
I showed you several links from SGI, where they talk about trying to going into scale-up enterprise market, coming from the HPC market. Nowhere do SGI say they have a scale-up server. SGI always talk about their HPC servers, trying to break into the enterprise market. You have seen several such links, you have even posted such links yourself. If SGI had good scale-up servers that easily bested Unix high end servers, SGI would not talk abou their HPC servers. Instead SGI talk about their UV300H 16-socket server trying to get a piece of the enterprise market. Why does not SGI use their UV2000 server if UV2000 is a scale-up server?
And where are the UV2000 enterprise benchmarks? Where are the SAP benchmarks?
.
ME: “This whole thread started by me, claiming that it is impossible to get high business enterprise SAP scores on x86”
YOU: "Incorrect. A top 10 score using only eight sockets on an x86 system for SAP has been validated. Apparently the impossible has been done."
Que? That is not what I asked! Are you trying to shift goal posts again? I quote myself again in my first post, nowhere do I ask about top 10 results:
"So, if we talk about serious business workloads, x86 will not do, because they stop at 8 sockets. Just check the SAP benchmark top - they are all more than 16 sockets, Ie Unix servers. X86 are for low end and can never compete with Unix such as SPARC, POWER etc. scalability is the big problem and x86 has never got passed 8 sockets. Check SAP benchmark list yourselves, all x86 are 8 sockets, there are no larger."
So, again, post a SAP benchmark competing with the largest Unix servers, with close to a million saps. Go ahead, we are all waiting. Or is it impossible for x86 to achieve close to a million saps? There is no way, no matter how hard you try? You must go to Unix? Or, can you post a x86 benchmark doing that? Well, x86 is worthless for SAP as it dont scale beyond 8-sockets on SAP and therefore can not handle extreme workloads.
.
ME:“Sigh. So much... ignorance. Large enterprise systems such as SAP, Oracle database, etc “
YOU: "The context was with regards to custom applications that companies themselves would write. The portability of the code was the businesses to determine, not a 3rd party vendor. "
What ramblings. I asked you about why high end Unix market has not died an instant, if x86 can replace them (which they can not, it is impossible to reach close to a million saps with any x86 server, SGI or not) to which you replied something like "it is because of vendor lockin companies continue to buy expensive Unix servers instead of cheap x86 servers". And then I explained you are wrong because Unix code is portable which makes it is easy to recompile among Linux, FreeBSD, Solaris, AIX,..., - just look at SAP, Oracle, etc they are all available under multiple Unixes, including Linux. To this you replied some incomprehensible ramblings? And you claim you have studied logic? I ask one question, you duck it (where are all links) or answer to another question which I did not ask. Again, can you explain why Unix high end market has not been replaced by x86 servers? It is not about RAS, and it is not about vendor lockin. So, why do companies pay $millions for one paltry 32-socket Unix server, when they can get a cheap 256-socket SGI server?
.
ME: “Wrong again. I quote myself: "I also showed you that the same 3.7GHz SPARC cpu on the same server, achieves higher saps with 32-sockets, and achieves lower saps with 40-sockets."
YOU: "And now you are even missing the points that you yourself were trying to make which how scaling from 32 sockets to 40 sockets was poor when in fact that comparison was invalid due to the differences in clock speed."
Que? I accepted your rejection of my initial analysis where I compared 3GHz cpu vs 3.7GHz of the same cpu model on the same server. And I made another analysis where I compared 3.7GHz vs 3.7GHz on the same server and showed that performance dropped with 40-sockets compared to 32-sockets, on a cpu per cpu basis. Explain how I was "missing the points that you yourself were trying to make"?
.
"...Also I think it is fair to reject extrapolation as you’ve also rejected my extrapolations elsewhere even as indicate as such..."
But your extrapolations are just.. quite stupid. For instance, comparing scale-out LINPACK benchmarks to asses scalability on SAP benchmarks? You are comparing apples to oranges. I compared same cpu model, with same GHz, on the same server - which you rejected.
.
ME:“Explain again what makes you believe a 32-socket x86 server would scale better than Unix servers. Is it because you looked at scale-out clustered x86 benchmarks and concluded about x86 scalability for SAP?”
YOU: "Again, this is not a citation."
No, but it is sheer stupidity. Can you explain again what makes you believe that? Or are you going to duck that question again? Or shift goal posts?
.
ME:“Can you quote the links and the where SGI say that UV300 is just a 16-socket UV2000 server?”
YOU: "Or you know you could have just read what I stated and realize that I’m saying that the UV300 is not a scaled down UV2000. (The UV 300 is a scaled down version of the UV 3000 that is coming later this year to replace the UV 2000.) Rather a 16 socket UV 2000 would have same attribute of having a uniform latency as all the sockets would be the same distance from each other in terms of latency. Again, this is yet another point you’ve missed."
What point have I missed? You claim that UV300 is just a 16-socket version of the UV2000. And I asked for links proving your claim. Instead of showing us links, you ramble something and conclude with "you missed the point"? What point? You missed the entire question! Show us links proving your claim. And stop about talking about that, I have missed some incomprehensible point you made. Instead, show us the links you refer to. Or is this also lies, aka FUD?
"...FUD is generally a strategic attempt to influence perception by disseminating...false information..."
.
"...No, SGI states UV2000 it is a scale up server plus provides the technical documentation to backup that claim. The idea that they’re trying to get into the enterprise market should be further confirmation that it is a scale-up servers that can run business workloads. Do you actually read what you’re quoting?..."
Great! Then we can finally settle this question! Show us scale-up benchmarks done with the UV2000 server, for instance SAP or large databases or other large business workloads. Or dont they exist? You do know that North Korea claims they are democratic and just, etc - do you believe that, or do you look at the results? Where are the UV2000 results running enterprise workloads? Why do SGI tout UV300H for the enterprise market instead of UV2000? SGI does not mention UV2000 running entprise workloads, they only talk about UV300H. But I might have missed them links, show us them. Or is it more of the same old lies, ie. the links do not exist, it is only FUD?
.
ME: “Can you show us one single customer that use UV2000 as a scale-up server?”
YOU: "I have before: the US Post Office."
I have seen your link where USP used a UV2000 for fraud detection, not used as a database storing data. Read your link again: "U.S. Postal Service Using Supercomputers to Stamp Out Fraud"
Analytics is not scale-up, it is scale-out. I have explained this in some detail and posted links from e.g. SAP and links talking about in memory databases which are exclusively used for analytics. Do you really believe anyone stores persistent data in RAM? No, RAM based databases are only used for analytics, as explained by SAP, etc.
You have not showed a scale-up usage of the UV2000 server, for instance, running SAP or Oracle databases for storing data. Can you post such a link? Any link at all?
.
"...I’ve given you a big example before: the US Post Office. Sticking your head in the sand is not a technical reason for the UV 2000 being a cluster as you claim. Seriously, back up your claim that the UV 2000 is a cluster..."
I have showed numerous links that UV2000 is used for HPC clustered workloads, and SGI talks about HPC market segment, etc. There do not exist any enterprise UV2000 benchmarks, such as SAP. Not a single customer during decades, has ever used a large HPC cluster from SGI for SAP. No one. Never. Ever. On the whole internet. Why is that do you think? If you claim SGI's large servers are faster and cheaper and can replace high end Unix 32-socket servers - why have no one ever done that? Dont they want to save $millions? Dont they want much higher performance? Why?
.
ME:“Sure, let SGI marketing call UV2000 a SMP server, they can say it is a carrot if they like - but still no one is going to eat it, nor use it to run scale-up business workloads.”
YOU; "Or you could read the technical data on the UV 2000 and realize that it has a shared memory architecture with cache coherency, two attributes that define a modern scale up SMP system. And again, I’ll reiterate that the US Post Office is indeed using these systems for scale-up business workloads."
Well, no one use SGI UV2000 for enterprise business workloads. US Post Office are using it for fraud detection, that is analysis in memory database. Not storing data. You store data on disks, not in memory.
.
ME: “Que? I have posted several links on this!”
YOU; "Really? The only one I’ve seen from you on this matter is a decade old and not in the context of SGI’s modern line up."
Que? That SGI link explains the main difference between HPC workloads and enterprise business workloads. It was valid back then and it is valid today: the link says that HPC workloads runs in a tight for loop, crunching data, there is not much data passed between the cpus. And Enterprise code branches all over the place, so there is much communication among the cpus making it hard for scale-out servers. This is something that has always been true and true today. And in the link, SGI said that their large Altix UV1000 server are not suitable for enterprise workloads.
In your links you posted, SGI talks about trying to break into the enterprise market with the help of the UV300H server. SGI does not talk about the UV2000 server for breaking into the enterprise market.
I quote SGI from one of your own link discussing SGI trying to break into enterprise:
"...So in a [SGI UV2000] system that we, SGI, build, we can build it for High-Performance Computing or data-intensive computing. They are basically the same structure at a baseline..."
SGI explicitly says that UV2000 is for HPC in one way or the other. I have posted numerous such links and quoted SGI numerous times, often from your own links! How can you say you have never seen any quotes???
.
ME: “For instance, here is another, where I quote from your own link:”
YOU: "Excellent! You’re finally able to accept that these systems can be used for databases and business workloads as the quote indicates that is what SGI is doing. Otherwise I find it rather strange that you’d quote things that run counter to your claims."
Que? Do you think we are stupid or who are trying to fool?
ARTICLE: "...Obviously, Linux can span an entire UV 2000 system because it does so for HPC workloads, but I am not sure how far a commercial database made for Linux can span....”
Ah! This actually interesting as it is in the context of the maximum number of threads a database can actually use. For example, MS SQL Server prior to 2014 could only scale to a maximum of 80 concurrent threads per database. Thus for previous versions of MS SQL Server, any core count past 80 would simply go to waste due to software limitations. As such, there may be similar limitations in other commercial databases that would be exposed on the UV 2000 that wouldn’t apply else where. Thus the scaling limitation being discussed here is with the database software, not the hardware so you missed the point of this discussion."
How the h-ck did you draw this weird conclusion? By pure il-logic? The guy in the article says that it is well known that HPC can span the entire UV2000 server but it is not known how far databases span the UV2000 server. And from this talk about the UV2000 server hardware, you conclude he talks about software limitations? Que? What have you been smoking? Nowhere does he talk about limitations on the database, the question is how well UV2000 scales on databases. And that is the big question. You have not missed the point, you have missed everything. As you can tell, I am not a native english speaker, but your english reading comprehension is beyond repair. Did you drop out of college as well? Sixth form? How could you even finish something with such a bad reading comprehension? You must have failed everything in school? How do you think a teacher would grade an essay of yours? Shake their head in disbelief.
.
ARTICLE:"...So in a [SGI UV2000] system that we, SGI, build, we can build it for high-performance computing or data-intensive computing. They are basically the same structure at a baseline..."
YOU:"And you missed the part of the quote indicating ‘or data-intensive computing’ which is a key part of the point being quoted that you’ve missed. Please actually read what you are posting please."
Que? Seriously? Do you know how to shorten High Performance Computing? By "HPC". SGI explicitly says they build the UV2000 for HPC or Data-Intensive Computing, both are scale-out workloads and runs on clusters. In your quote SGI explicitly says that UV2000 are used for clustered scale-out workloads, i.e. HPC and DIC. So you are smoked.
http://en.wikipedia.org/wiki/Data-intensive_comput...
Data-intensive processing requirements normally scale linearly according to the size of the data and are very amenable to straightforward parallelization....Data-intensive computing platforms typically use a parallel computing approach combining multiple processors and disks in large commodity computing CLUSTERS
.
ARTICLE: "...IBM and Oracle have become solution stack players and SAP doesn’t have a high-end vendor to compete with those two. That’s where we, SGI, see ourselves getting traction with HPC servers into this enterprise space...."
YOU: "This would indicate that the UV 2000 and UV 300 are suitable for enterprise workloads which runs counter to your various claims here."
BEEP! Wrong. No it does not "indicate" that. SGI talks about using UV300 to get into the enterprise market. They only mention UV2000 when talking about HPC or DIC, both clustered scale-out workloads. You quoted that above.
.
ARTICLE: "...The goal with NUMAlink 7 is...reducing latency for remote memory. Even with coherent shared memory, it is still NUMA and it still takes a bit more time to access the remote memory..."
YOU: "Coherency and shared memory are two trademarks of a large scale-up server. Quoting this actually hurts your arguments about the UV 2000 being a cluster as I presume you’ve also read the parts leading up to this quote and the segment you cut. The idea that accessing remote memory adds additional latency is a point I’ve made else where in our discussion and it is one of the reasons why scaling up is nonlinear. Thus I can only conclude that your quoting of this is to support my argument. Thank you!"
Well, your conclusion is wrong. SGI talks about UV2000 built for HPC or DIC. Not enterprise. So you have missed the whole article, you did not only miss the point. You missed everything. Nowhere do SGI say that UV2000 is for enterprise. Isntead UV300H is for enterprise. You are making things up. Or can you quote where SGI says UV2000 is for enterprise, such as SAP or databases?
.
ME:“So here you have it again. SGI exclusively talks about getting HPC computation servers into enterprise. Read your link again and you will see that SGI only talks about HPC servers.”
YOU: "And yet you missed the part where they were talking about those systems being used for enterprise workloads."
Que? Nowhere do SGI say so. Go ahead and quote the article where SGI say so. I can not decide if you are Trolling or if you are a bit dumb, judging from your interpretation of the above links?
.
ME:“In the link they only talk about 32-sockets, they explicitly mention 32-socket SGI UV3000H and dont mention UV2000 with 256 sockets. They say that bigger scale-up servers than 32-sockets will come later.”
YOU: "Which would ultimately means that 32 sockets is not a hard limit for SAP HANA as you’ve claimed. I’m glad you’ve changed your mind on this point and agree with me on it."
Duh, you missed the point. SAP does say that there are no larger scale-up x86 servers than 32-sockets. SAP does not say that UV2000 256-sockets are usable for this scenario. So, here again do we see that UV2000 is not suitable for Hana, but instead UV300H is mentioned. So, why dont they talk about UV2000 256-sockets, instead of only saying that 32-sockets are the largest scale-up servers? So, you are wrong and have been wrong all the time. SGI and SAP supports me. Why? Because, it is actually the other way around, I support them. I would never lie (like you do), I only reiterate what I read. If SGI and SAP said UV2000 were good for SAP, I would write that instead, yes it it true. I dont lie. I know it is hard for you to believe in (liars believe everyone lies). But some people dont like lies, mathematicians like the truth.
And Hana is distributed so it scales to a large amount of cpus, no one has denied that.
.
"....Except a score in the SAP top 10 does mean that they are competitive as what you originally asked for. The top 10 score was an 8 socket offering, counter to your claims that all the top scores were this 16 sockets or more...."
No, I did not "originally" ask for top 10. Stop shifting goal posts all the time or abuse the truth, aka lie. I asked for the very top, to compete with high end Unix servers. And I dont see the best x86 server achieving ~35% of the top Unix server is competing. There is no competition. So, again, show me a x86 server that can compete with the best Unix server in SAP benchmarks. You can not because there is no x86 server than can tackle the largest SAP workloads.
"...Also if you really looked, there are 16 socket x86 score from several years ago. At the time of there submission they were rather good but newer systems have displaced them over time...."
"Rather good"? You are lying so much that you believe yourself. Those worthless 16-socket x86 servers gets 54.000 saps as best which is really bad. And the best SPARC server from the same year, the M9000 gets 197.000 saps, almost 4x more. Try to fool someone else with "x86 can compete at the very top with high end Unix servers".
download.sap.com/download.epd?context=40E2D9D5E00EEF7C9AF1134809FF8557055EFBE3810C5CE80E06D1AE6A251B04
Do you know anything about the best 16-socket server, the IBM X3950M2? The IBM server is built from four individual 4-socket units, connected together with a single cable into a 16-socket configuration. This single cable between the four nodes, makes scalability awfully bad and makes performance more resemble a cluster. I dont know if anyone ever used it in the largest 16-socket configuration as a scale-up server. I know that several customers used it as a multi-node scale-out server, but I never heard about any scale-up customer. Maybe because IBM X3950 M2 only gets ~3.400 saps per cpu in the 16-socket configuration. Whereas two entries below we have a 4-socket x86 server with the same cpu, and the same GHz, and it gets 4.400 saps. So the 4-socket version is 33% faster, just by using fewer cpus. So SAP scaling drops off really fast, especially on x86.
The SPARC T5440 from same year has 4-sockets, and gets 26.000 saps. Half the score of the 16-socket IBM X3950M2. But I would never conclude that a 8-socket T5440 would get 52.000 saps, as you would wrongly conclude. I know SAP scaling drops off really fast.
This only proves my point, there is no way x86 can compete with high end Unix servers, at any given point in time: even if you go to 16-sockets, x86 had nothing to come up with because scaling via cables is too bad. What a silly idea.
.
"...So by your definition above, all the large 32 socket systems are then clusters because they don’t offer uniform latency..."
Correct. If we are going to be strict, yes. NUMA systems by definition, have different latency to different cpus. and ALL high end Unix servers are NUMA. But they are well designed, with low worst-case latency, so they can in fact run enterprise systems, as can be seen in all SAP and oracle benchmarks all over the internet.
"...Rather the defining traits are rather straightforward: a single logical system with shared memory and cache coherency between all cores and sockets. Having equal latency between sockets, while ideal for performance, is not a necessary component of the definition...."
No, because the defining trait is: what are the usage scenarios for the servers? All high end Unix servers you mentioned, are used for enterprise usage. Just look at the sap benchmarks, databases, etc. Whereas SGI UV2000 no one use them for enterprise usage. You can google all you want, but no one has ever used SGI's large HPC servers for enterprise usage. It does not matter how much SGI calls it a carrot, it still not a carrot. Microsoft calls Windows an Enterprise OS, but still no stock exchange in the world, use Windows. They all run Linux or Unix.
Instead of reading the marketing material; ask yourself what are the servers used for in production? Do you believe MS marketing too?
.
"...Again, define branch heavy in this context. I’ve asked for this before without answer. I believe you mean something else entirely..."
I mean exactly what SGI explained in my other link. But you know nothing about programming, so I understand you have difficulties with this concept. But this is really basic to a programmer. I am not going to teach you to program, though.
.
ME;“It is not goal shifting. All the time we have talked about large scale-up servers for enterprise usage, and if I forget to explicitly use all those words in every sentence, you call it "goal shifting".”
YOU; Since that is pretty much the definition of goal shifting, thank you for admitting to it. In other news, you still have not explained the contradiction in your previous statements.
No it is not the "definition" of goal shifting. Normal people as they discuss something at length, do not forget what they talked about. If I ask 5 times in every post, about one single link where SGI UV2000 replaced a scale-up server on enterprise workloads, and once just write "can you show us a single link where a SGI UV2000 replaced a scale-up server" - it is desperate to show a link where UV2000 replaces scale-out workloads. That is pure goal shifting from your side, and at the same time you accuse me for doing goal shifting? It reeks desperation. Or plain dumbness. Or both.
You have showed us a link on something I did not ask of (why do you duck my questions all the time?), can you show us a link where a SGI UV2000 replaced a scale-up server, on enterprise business workloads?
.
"...Apparently it isn’t much of a NDA if it is part of a press release. Go ask the actual customer HP quoted as they’re already indicating that they are using a Superdome X system...Regardless of socket count, it is replacing a Unix system as HPUX is being phased out. If you doubt it, I’d just email the gentlemen that HP quoted and ask."
Are you stupid? It is you that are stating something dubious (or false). You prove your claim.
.
"You don’t actually demonstrate that locking was used for OCC or MVCC. Rather you’ve argued that since concurrency is maintained, it has to have locking even though you didn’t demonstrate where the locking is used in these techniques. Of course since they functionally replace locking for concurrency control, you won’t find it."
Are you obtuse? I quoted that they do lock. Didnt you read my quotes? Read them again. Or did you not understand the comp sci lingo?
.
"...Also skip the personal attacks shown where these techniques are used in enterprise production databases..."
How about you skip the FUD? You write stuff all the time that can not be proven. That is false information.
"...FUD is generally a strategic attempt to influence perception by disseminating...false information..."
.
"...Oh I have before, the US Post Office has a UV 2000 for database work. Of course you then move the goal posts to where SAP HANA was no longer a real database...."
No, this is wrong. USP use the UV2000 for analytics, not database work. The ram database is used for "fraud detection" as quoted from your link. A real database is used to store persistent data on disks, not in RAM.
.
ME: "And if I dont specify "business enterprise workloads" in every sentence, you immediately jumps on that and shift to talking about HPC calculations or whatever I did not specify.”
YOU: "Again, that is pretty much the definition of shifting the goals post and I thank you again for admitting to it."
Que? I have asked you to post links to where a UV2000 replaced a scale-up server on "business enteprise workloads" many times, and once I did not type it out, because I forgot and also you know what we are discussing. And as soon I make a typo, you jump on that typo instead. Instead of showing a link on what I have asked for probably 30 times now, you have ducked all those requests, and instead you show us a link of something I have never once asked about when I did a typo? That is the very definition of goal shifting you are doing.
http://en.wikipedia.org/wiki/Moving_the_goalposts
"...The term is often used ...by arbitrarily making additional demands just as the initial ones are about to be met...."
Anyway, we all know that you do move the goal posts, we have seen it all the times. And cudos to you for posting a very fine link where a scale-out UV2000 cluster replaced another server on scale-out computations. I dont know really why you posted such a link, as I have never asked about it earlier. But never mind. Can we go back to my original question again, without you trying to duck the question for the 31st time, or posting something irrelevant? Here it comes, again:
-If you claim that the SGI UV2000 is a scale-up server, then you can surely show us several links where UV2000 replaces scale-up servers on scale-up business enterprise workloads? SGI has explictly said they have tried to get into the enterprise market for many years now, so there surely must exist several customers who replaced high end Unix servers with UV2000, on enterprise business workloads, right?
Or, are you going to ask me to prove this as well, as you did with Superdome X too?
.
"...Actually what I pointed out as a key attribute of those large scale up machines: a single large memory space. That is why the institute purchased the M9000 as well the UV 2000. If they just wanted an HPC system, they’d get a cluster which they did separately alongside each of these units. In other words, they bought *both* a scale up and a scale out system at the same time. In 2009 the scale up server selected was a M9000 and in 2013 their scale up server was a UV 2000. It fits your initial request for UV 2000 replacing a large scale up Unix machine..."
No, this is incorrect again. Back in time, the SPARC M9000 had the world record in floating point calculations, so it was the fastest in the world. And let me tell you a secret; a mathematical institute do not run large scale business enterprise workloads needing the largest server money could buy, they run HPC mathematical calculations. I know, I come from such an mathematical institute and have programmed a Cray supercomputer with MPI libraries and also tried OpenMP libraries - both are used exclusively for HPC computations.
Why would a mathematical institute run... say, a large SAP configuration? Large SAP installations can cost more than $100 million, where would a poor research institute get all the money from, and why would a math institute do all that business? This has never occured to you, right? But, let me tell, it is a secret, so dont tell anyone else. Not many people knows math institutes do not run large businesses. I understand you are confused and you did not know this. Have you ever set your foot on a comp sci or math faculty? You dont know much about math that is sure, and you claim you have "studied logic" at a math institute? Yeah right.
.
How about you stop FUDing? The very definition of FUD:
"...FUD is generally a strategic attempt to influence perception by disseminating...false information..."
Do you consider this FUD? Can you answer me? Can you stop ducking my questions?
"SPARC M6-32 is much faster than SGI UV2000 on HPC calculations. I am not going to show you benchmarks nor links. You have to trust me".
How is this different from:
"SGI UV2000 is a scale-up server and can replace high end Unix servers on enterprise business workloads such as SAP. I am not going to show you benchmarks on enterprise business workloads nor links. You have to trust me."
It is very easy to silence me and make me change my mind. Just show me some benchmarks, for instance, SAP or database benchmarks. Or show links to customers that have replaced modern high end Unix servers with SGI UV2000 on business enterprise workloads. Do that, and I will stop believing UV2000 is not suitable for scale-up workloads.
Brutalizer - Monday, June 1, 2015 - link
Yes! I got an email reply from a SGI "SAP PreSales Alliance Manager" from Germany, about using UV2000 for the enterprise market. This is our email exchange:ME:
>>Hello, I am a consultant examining the possibilities to run SAP and
>>Oracle databases using the largest 256-socket SGI UV2000.
>>Our aim is to replacing expensive Unix servers, in favour of cheaper
>>UV2000 doing enterprise business work. 256-socket beats 32-socket Unix servers anyday.
>>Do you have any more information on this? I need to study this
>>more, on how to replace Unix servers with UV2000 on enterprise
>>workloads before we can reach a conclusion. Any links/documentation would
>>be appreciated.
SGI:
>>Hello Mr YYY,
>>
>>I'm happy to discuss with you regarding your request.
>>We have an successor of the above mentioned UV2000. It's the SGI UV300H for
>>HANA. But it is also certified for any other SAP workload.
>>Would be good to discuss in more detail what is your need.
>>
>>When are you available for a call?
>>
>>Thanks you for reaching out to SGI
>>XXX
ME:
>>Dear XXX
>>
>There are no concrete plans on migrating off Unix for the client, but
>>I am just brain storming. I myself, am interested in the SGI UV2000 as
>>a possible replacement for enterprise workloads. So I take this as an
>>opportunity to learn more on the subject. I reckon the UV300H only has
>>16-sockets? Surely UV2000 must beat 16-sockets in terms of performance
>>as it is a 256-socket SMP server! Do you have any documentation on this?
>>Customer success stories? Use cases? I want to study more about
>>possibilities to use large SGI servers to replace Unix servers for
>>enterprise workloads before I say something to my manager. I need
>>to be able to talk about advantages/disadvantages before talking to
>>anyone, know the numbers. Please show me some links or documentation
>>on UV2000 for enterprise business workloads, and I will read through them all.
SGI:
>Hi YYY
>the UV300H has a maximum of 32 Intel Xeon E7 sockets starting with 4 sockets
>and scales up to 24TB main memory. It's an SMP system with an All-to-All topology.
>It scales in increments of 4 sockets and 3TB.
>It is certified for SAP and SAP HANA.
>http://www.sgi.com/solutions/sap_hana/
>
>The UV2000 is based on Intel Xeon E5 which scales up to 256 sockets
>and 64TB main memory within an Hypercube or Fat-Tree topology. It
>starts with 4 sockets and scales in 2 socket increments.
>It is certified for SAP but not for SAP HANA.
>http://www.sgi.com/products/servers/uv/uv_2000_20....
>
>Both can run SUSE SLES and Red Hat RHEL.
>
>In respect to use cases and customer success story for the
>UV2000 in the enterprise market (I assume I mean SAP and Oracle)
>we have only limited stuff here. Because we target the UV2000 only
>for the HPC market, eg. Life Science.
>Look for USPS on http://www.sgi.com/company_info/customers/
>
>For a more generic view on our Big Data & Data Analytics business
>see https://www.sgi.com/solutions/data_analytics/
ME:
>So you target UV2000 for the HPC market, but it should not
>matter, right? It should be able to replace Unix servers on
>the enterprise market, because it is so much more powerful
>and faster. And besides, Linux is better than Unix. So I
>would like to investigate more how to use UV2000 for the
>enterprise market. How can I continue with this study? Dont
>you have any customer success stories at all? You must have.
>Can you show me some links? Or, can you forward my email to some
>senior person that has been with SGI for a long time?
And that is where we finished for today. I will keep you updated. I also
looked at the USPS link he talks about mentioning "eg. Life Science", i.e.
the US Postal Service that you FUDer KevinG mentioned. On SGI web site
it says:
"We use the SGI platform to take the USPS beyond what they were able to achieve
with traditional approaches." and there is a video. If you look at the video
it says at the top in the video window:
"Learn How SGI and FedCentric Deliver Real-Time Analytics for USPS"
Nowhere do they mention databases, SGI UV2000 is just used as an analytics tool.
Which is in line with the SGI links about how UV2000 is great for HPC and DIC,
Data Intensive Computing and High Performance Computing
So you are wrong again. USP Postal Service is using UV2000 for real time analytics,
not as a database storing persistent data. I dont know how many times I need
to repeat this.
Kevin G - Tuesday, June 2, 2015 - link
@Brutalizer"Well, you say that because x86 benchmarks scales well going from 8-sockets to 16 sockets, you expect x86 to scale well for 32-sockets too, on SAP. Does this not mean you expect x86 scales close to linear?"
That is not a quote where I make such claims. I asked for a quote and you have to not provided.
"Hmmm.... actually, this is really uneducated. Are you trolling or do you really not know the difference? All these clustered benchmarks are designed for clustered scale-out servers. For instance, LINPACK is typically run on supercomputers, big scale-out servers with 100.000s of cpus."
Or perhaps you actually read what I indicated. As a nice parallel benchmark, they can indeed be used to determine scaling as that will be a nice *upper bound* on performance gains for business workloads. If there is a performance drop on a single node instance of LINPACK as socket count increases, I'd expect to see a similar drop or more when running business workloads.
"BTW, have you heard about P-complete problems? Or NC-complete problems? Do you know something about parallel computations? You are not going to answer this question as well, right?"
Sure I'll skip that question as it has no direct relevancy and I'd rather like to ignore indirect tangents.
"Where are the benchmarks on x86 servers going from 8-sockets up to 16-sockets, you have used to conclude about x86 scalability? I have asked you about these benchmarks. Can you post them and backup your claims and prove you speak true or is this also more of your lies, i.e. FUD?"
As mentioned below, there were indeed older 16 socket x86 servers in the SAP benchmark database despite your claims here that they don't exist. This would be yet another contradiction you've presented. There are also various SPEC scores among others.
"I showed you several links from SGI, where they talk about trying to going into scale-up enterprise market, coming from the HPC market. Nowhere do SGI say they have a scale-up server."
Except that SGI does indicate that the UV 2000 is a large SMP machine, a scale up system. 'SGI UV 2000 scales up to 256 CPU sockets and 64TB of shared memory as a single system.' from the following link: https://www.sgi.com/products/servers/uv/
I've also posted other links previous where SGI makes this claim before which you continually ignore.
"SGI always talk about their HPC servers, trying to break into the enterprise market. You have seen several such links, you have even posted such links yourself. If SGI had good scale-up servers that easily bested Unix high end servers, SGI would not talk abou their HPC servers."
SGI would still talk about their HPC servers as they offer other systems dedicated to HPC workloads. SGI sells more than just the scale up UV 2000 you know.
"Instead SGI talk about their UV300H 16-socket server trying to get a piece of the enterprise market. Why does not SGI use their UV2000 server if UV2000 is a scale-up server?"
The UV300H goes up to 32 sockets, 480 cores, 960 threads and 48 TB of memory. For vast majority of scale up workloads, that is more than enough in a single coherent system. The reason SGI is focusing on the UV300 for the enterprise is due to more consistent and lower latency which provides better scaling up to 32 sockets.
"Que? That is not what I asked! Are you trying to shift goal posts again? I quote myself again in my first post, nowhere do I ask about top 10 results:
"So, if we talk about serious business workloads, x86 will not do, because they stop at 8 sockets. Just check the SAP benchmark top - they are all more than 16 sockets, Ie Unix servers. X86 are for low end and can never compete with Unix such as SPARC, POWER etc. scalability is the big problem and x86 has never got passed 8 sockets. Check SAP benchmark list yourselves, all x86 are 8 sockets, there are no larger."
Except I've pointed out x86 results in the top 10, other Unix systems with 8 sockets in the top 10 and previous 16 socket x86 results. So essentially your initial claims here are incorrect.
"What ramblings. I asked you about why high end Unix market has not died an instant, if x86 can replace them [...] to which you replied something like "it is because of vendor lockin companies continue to buy expensive Unix servers instead of cheap x86 servers". And then I explained you are wrong because Unix code is portable which makes it is easy to recompile among Linux, FreeBSD, Solaris, AIX,..., - just look at SAP, Oracle, etc they are all available under multiple Unixes, including Linux. To this you replied some incomprehensible ramblings? And you claim you have studied logic? I ask one question, you duck it (where are all links) or answer to another question which I did not ask. Again, can you explain why Unix high end market has not been replaced by x86 servers? It is not about RAS, and it is not about vendor lockin. So, why do companies pay $millions for one paltry 32-socket Unix server, when they can get a cheap 256-socket SGI server?"
You forgot to include the bit where I explicitly indicate that my statements were in the context of companies developing their own software and instead you bring up 3rd party applications.
Custom developed applications are just one of several reasons why the Unix market continues to endure. Superior RAS is indeed another reason to continue get Unix hardware. Vendor lock-in due to specific features in a particular flavor of Unix is another. They all contribute to today's continued, but shrinking, need for Unix servers. System architects should choose the best tool for the job and there is a clear minority of cases where that means Unix.
"Que? I accepted your rejection of my initial analysis where I compared 3GHz cpu vs 3.7GHz of the same cpu model on the same server. And I made another analysis where I compared 3.7GHz vs 3.7GHz on the same server and showed that performance dropped with 40-sockets compared to 32-sockets, on a cpu per cpu basis. Explain how I was "missing the points that you yourself were trying to make"?"
How could you determine scaling from 32 to 40 sockets when the two 3.7 Ghz systems were 16 and 32 socket? Sure, you could determine scaling from 16 to 32 socket as long as you put a nice asterisk noting the DB difference in the configuration. However, you continue to indicate that you determined 32 to 40 socket scaling which I have only seen presented in a flawed manner. You are welcome to try again.
"No, but it is sheer stupidity. Can you explain again what makes you believe that? Or are you going to duck that question again? Or shift goal posts?"
The goal post here was a citation on this point and this again is not a citation.
"What point have I missed? You claim that UV300 is just a 16-socket version of the UV2000."
Citation please where I make that explicit claim.
"Analytics is not scale-up, it is scale-out. I have explained this in some detail and posted links from e.g. SAP and links talking about in memory databases which are exclusively used for analytics. Do you really believe anyone stores persistent data in RAM? No, RAM based databases are only used for analytics, as explained by SAP, etc."
Amazing that you just flat our ignore Oracle, IBM and Microsoft about having in-memory features for their OLTP databases. In-memory data bases are used for more than just analytics despite your 'exclusive' claim.
"You have not showed a scale-up usage of the UV2000 server, for instance, running SAP or Oracle databases for storing data. Can you post such a link? Any link at all?"
Oracle TimesTen is indeed a database and that is what the US Post Office uses on their UV 2000. I believe that I've posted this before. As far as data storage goes, half a billion records are written to that database daily as part of the post office's fraud detection system. So instead of accepting this and admit that you're wrong, you have to now move the goal posts by claiming that analytics and in-memory databases some how don't apply.
"Why is that do you think? If you claim SGI's large servers are faster and cheaper and can replace high end Unix 32-socket servers - why have no one ever done that? Dont they want to save $millions? Dont they want much higher performance? Why?"
Cost is something that works against the UV 2000, not in hardware costs though. Enterprise software licensing is done on a per core and/or a per socket basis. Thus going all the way to up 256 sockets would incur an astronomical licensing fee. For example Oracle 12c Enterprise edition would have a base starting price of just ~$48.6 million USD to span all the cores in a 256 socket, 2048 core UV 2000 before any additional options. A 40 socket, 640 core Fujitsu M10-4S would have a base starting price of ~$15.2 million USD for Oracle before addition options are added. Any saving in hardware costs would be eaten up by the software licensing fees.
http://www.oracle.com/us/corporate/pricing/technol...
http://www.oracle.com/us/corporate/contracts/proce...
"Well, no one use SGI UV2000 for enterprise business workloads. US Post Office are using it for fraud detection, that is analysis in memory database. Not storing data. You store data on disks, not in memory."
And here you go again slandering in-memory databases as not being a real database or can used for actual business workloads. I'd also say that fraud detection is a business work load for the US Post Office.
"Que? That SGI link explains the main difference between HPC workloads and enterprise business workloads. It was valid back then and it is valid today: the link says that HPC workloads runs in a tight for loop, crunching data, there is not much data passed between the cpus. And Enterprise code branches all over the place, so there is much communication among the cpus making it hard for scale-out servers. This is something that has always been true and true today. And in the link, SGI said that their large Altix UV1000 server are not suitable for enterprise workloads."
http://www.realworldtech.com/sgi-interview/6/ is the link you keep posting about this and it should be pointed out that it predates the launch of the UV 1000 by a full six years.
And again you go on about code branching affecting socket scaling. Again, could you actually define this in the context you are using it?
"In your links you posted, SGI talks about trying to break into the enterprise market with the help of the UV300H server. SGI does not talk about the UV2000 server for breaking into the enterprise market."
I'll quote this from the following link: 'The SGI deal with SAP also highlights the fact that the system maker is continuing to expand its software partnerships with the key players in the enterprise software space as it seeks to push systems like its “UltraViolet” UV 2000s beyond their more traditional supercomputing customer base. SGI has been a Microsoft partner for several years, peddling the combination of Windows Server and SQL Server on the UV 2000 systems, and is also an Oracle partner. It is noteworthy that Oracle’s eponymous database was at the heart of the $16.7 million fraud detection and mail sorting system at the United States Postal Service.' from http://www.enterprisetech.com/2014/01/14/sgi-paint...
It is clear that SGI did want the UV 2000 to enter the enterprise market and with success at the US Post Office, I'd say they have a toe hold into it.
"Que? Do you think we are stupid or who are trying to fool?"
I'm quoting this only because it amuses me. 'We' now?
"The guy in the article says that it is well known that HPC can span the entire UV2000 server but it is not known how far databases span the UV2000 server. And from this talk about the UV2000 server hardware, you conclude he talks about software limitations?"
Or I actually read the article and understood the context of those statements you were attempting to quote mine. Software does have limitations in how far it can scale up and that was what was being discussed. If you disagree, please widen that quote and cite specifically where they would be discussing hardware in this specific context.
"Que? Seriously? Do you know how to shorten High Performance Computing? By "HPC". SGI explicitly says they build the UV2000 for HPC or Data-Intensive Computing, both are scale-out workloads and runs on clusters. In your quote SGI explicitly says that UV2000 are used for clustered scale-out workloads, i.e. HPC and DIC. So you are smoked."
Quote mining wikipedia now? The reason for using the UV2000 for DIC workloads is that with 64 TB of memory, many workloads can be run from in-memory from a single node. Thus IO storage bottleneck and networking overhead of a cluster are removed if you can do everything in-memory from a single system. If the data size is less than the 64 TB mark, then a cluster would be unnecessary. The networking overhead disappears when you can run it on a single node. Concurrency can be handled in an efficient manner with all the data residing in memory.
"BEEP! Wrong. No it does not "indicate" that. SGI talks about using UV300 to get into the enterprise market. They only mention UV2000 when talking about HPC or DIC, both clustered scale-out workloads. You quoted that above."
There are plenty of enterprise workloads that fall into the DIC category, like very large OTLP databases or analytics. Then again you are arbitrarily narrowing what defines enterprise workloads and what defines scale up so at some point nothing will be left.
"Well, your conclusion is wrong. SGI talks about UV2000 built for HPC or DIC. Not enterprise. So you have missed the whole article, you did not only miss the point. You missed everything. Nowhere do SGI say that UV2000 is for enterprise. Isntead UV300H is for enterprise. You are making things up. Or can you quote where SGI says UV2000 is for enterprise, such as SAP or databases?"
Except you are missing the massive overlap with DIC and enterprise workloads. "Second, selling UV big memory machines to large enterprises for in-memory and other kinds of processing will no doubt give SGI higher margins than trying to close the next big 10,000-lot server deal at one of the hyperscale datacenter operators." Note that the time of that article the UV 300 had yet to be announced. http://www.enterprisetech.com/2014/03/12/sgi-revea...
"Duh, you missed the point. SAP does say that there are no larger scale-up x86 servers than 32-sockets."
And I was asking you to demonstrate that 32 sockets was a hard limit for SAP HANA, which you have not provided. Instead you have posted a hardware answer for a question about software scaling. If HP were to release a 64 socket SuperDome X, would HANA run on it? The article indicates, that yes it would.
""Rather good"? You are lying so much that you believe yourself. Those worthless 16-socket x86 servers gets 54.000 saps as best which is really bad. And the best SPARC server from the same year, the M9000 gets 197.000 saps, almost 4x more. Try to fool someone else with "x86 can compete at the very top with high end Unix servers"."
It is rather good considering that those older x86 systems only had 16 sockets vs. 64 socket machine of the that Fujitsu system you cite. This situation currently mirror where we are now with an 8 socket x86 system does a third of the work with one fifth the number of sockets as the record holder.
"Do you know anything about the best 16-socket server, the IBM X3950M2? The IBM server is built from four individual 4-socket units, connected together with a single cable into a 16-socket configuration. This single cable between the four nodes, makes scalability awfully bad and makes performance more resemble a cluster. I dont know if anyone ever used it in the largest 16-socket configuration as a scale-up server."
So? Using multiple chassis is an IBM feature they also use in various POWER systems (p770 is an example). This alone does not make it a cluster as again cache coherency maintained and memory is shared. While a minor point, IBM uses six cables to put four chassis together.
Cache coherency is maintained as you would expect from a scale up server. Certainly scaling to 16 sockets with the older Intel FSB topology would be painful so the drop in performance per socket was expected. Intel has been replaced this topology with QPI for point to point to links with the addition of integrated memory controllers for NUMA. For x86 systems in this era, Opterons scaled better as AMD already implemented point-to-point and NUMA topology but only went to 8 sockets.
"Correct. If we are going to be strict, yes. NUMA systems by definition, have different latency to different cpus. and ALL high end Unix servers are NUMA. But they are well designed, with low worst-case latency, so they can in fact run enterprise systems, as can be seen in all SAP and oracle benchmarks all over the internet."
Then by your own definition, why are you citing 16 and 32 socket 'clusters' to run enterprise workloads? I heard from this guy Brutalizer that you can't run enterprise workloads on clusters. He said it on the internet so it must be true!
"No, because the defining trait is: what are the usage scenarios for the servers? All high end Unix servers you mentioned, are used for enterprise usage."
That is the thing, you are ignoring the hardware entirely when discussing system topology. Your analysis is easily flawed due to this. On the x86 side of things, you can load Linux and a OLTP database like Oracle onto a netbook if you wanted. That would not mean that wouldn't mean that the netbook is production worthy.
Shared memory and cache coherency is what enables SMP scaling, features that the UV 2000 has all the way to 256 sockets and 64 TB of memory.
"I mean exactly what SGI explained in my other link. But you know nothing about programming, so I understand you have difficulties with this concept. But this is really basic to a programmer. I am not going to teach you to program, though."
That is not a definition. Please explain your definition of code branches in the context of increasing socket count. I'll humor a rehash of the definition of branching as it is indeed basic but I don't recall a branching being a factor in scaling socket count. So please enlighten me.
"Are you stupid? It is you that are stating something dubious (or false). You prove your claim."
OK, so I asked and indeed they have several SuperDome X systems. They all have 240 cores but on select systems they are not all enabled. I didn't follow up on this point but it appears this is due to keep software licensing costs down. They also have varying amount of memory ranging from 1 TB to 4 TB. (And Kent didn't answer himself as he was out of office last week but I pursued the person indicated in his automatic reply who did get me an answer.)
"Are you obtuse? I quoted that they do lock. Didnt you read my quotes? Read them again. Or did you not understand the comp sci lingo?"
I read them and I stated exactly what I saw: you did not demonstrate that locking was used for OCC or MVCC. If you feel like reasserting your claim, please do so but this time with actual evidence.
"No, this is wrong. USP use the UV2000 for analytics, not database work. The ram database is used for "fraud detection" as quoted from your link. A real database is used to store persistent data on disks, not in RAM."
In fact this is further admission of moving the goal posts on your part as you are now attacking the idea of in-memory databases as not being real databases now. OLTP and OLAP workloads can be performed by an in-memory database perfectly fine.
"If you claim that the SGI UV2000 is a scale-up server, then you can surely show us several links where UV2000 replaces scale-up servers on scale-up business enterprise workloads? SGI has explictly said they have tried to get into the enterprise market for many years now, so there surely must exist several customers who replaced high end Unix servers with UV2000, on enterprise business workloads, right?"
And again I will cite the US Post Office's usage of a UV 2000 as fitting the enterprise workload requirement.
"...Actually what I pointed out as a key attribute of those large scale up machines: a single large memory space. That is why the institute purchased the M9000 as well the UV 2000. If they just wanted an HPC system, they’d get a cluster which they did separately alongside each of these units. In other words, they bought *both* a scale up and a scale out system at the same time. In 2009 the scale up server selected was a M9000 and in 2013 their scale up server was a UV 2000. It fits your initial request for UV 2000 replacing a large scale up Unix machine..."
"No, this is incorrect again. Back in time, the SPARC M9000 had the world record in floating point calculations, so it was the fastest in the world. And let me tell you a secret; a mathematical institute do not run large scale business enterprise workloads needing the largest server money could buy, they run HPC mathematical calculations."
Again, you are ignoring the reason why they purchased both the M9000 and UV 2000: get a large memory space that only a large scale up server can provide. And again you also ignore that for HPC workloads, they also bought a cluster alongside these systems.
"Why would a mathematical institute run... say, a large SAP configuration? Large SAP installations can cost more than $100 million, where would a poor research institute get all the money from, and why would a math institute do all that business? This has never occured to you, right? But, let me tell, it is a secret, so dont tell anyone else. Not many people knows math institutes do not run large businesses. I understand you are confused and you did not know this. Have you ever set your foot on a comp sci or math faculty? You dont know much about math that is sure, and you claim you have "studied logic" at a math institute? Yeah right."
I'd actually say the reason why they wouldn't run SAP is due to the software licensing costs so open source solutions with no direct licensing fees are heavily favored. They do deal with big data problems similar to what businesses are trying to solve right now.
And my logic course as at an ordinary university, I made no claim that it was a math institute.
Brutalizer - Thursday, June 4, 2015 - link
Ok, I understand what you are doing. I have some really relevant questions (for instance, if you claim that UV2000 can be used for scale-up business workloads, where are links?) and you either dont answer them or answer a totally another question. This makes the relevant questions disappear in the wall of text, which makes it easy for you to avoid these questions that really pin-point the issue. So in this post I have just three pin-point questions for you, which means you can not duck them any longer.
.
Q1) Would you consider this as spreading dubious or false information? If yes, how is this different from your claim that SGI UV2000 can handle scale-up business workloads better than high end Unix servers (without any links)?
"SPARC M6-32 server is much faster than UV2000 in HPC calculations. I have no benchmarks to show, nor links to show. I emailed Oracle and they confirmed this. Which means this is a proven fact, and no speculation"
.
Q2) Show us a link on a x86 server that achieves close to a million saps. You claim that x86 can tackle larger workloads (i.e. scales better) than Unix servers, so it should be easy for you to show links with x86 getting higher saps than Unix. Or is this claim also FUD?
.
Q3) Show us an example of UV2000 used for large enterprise scale-up workloads, such as SAP or Oracle database (not dataware house analytics, but a real database storing persistent data which means they have to maintain data integrity which is extremely difficult in HPC clusters)
.
These are the three questions I asked numerous times earlier, and you have always ducked them in one way or another. Are you going to duck them again? Doesn't it make your view point a bit hollow?
.
.
Regarding UV2000 used by US Postal Service. SGI says in several links UV2000 is only used for real time analytics. "Learn How SGI and FedCentric Deliver Real-Time Analytics for USPS":
https://www.youtube.com/watch?v=tiLrkqatr2A&li...
This means that UV2000 compare each mail with a RAM database to see if this mail is fraudulent. In addition to the TimesTen memory database, there is also an Oracle 10G datawarehouse as a backend, for "long term storage" of data.
http://www.datacenterknowledge.com/archives/2013/0...
"[TimesTen] coupled with a transactional [Oracle] database it will perform billions of mail piece scans in a standard 15 hour processing day. Real-time scanning performs complex algorithms, powered by a SGI Altix 4700 system. Oracle Data Warehouse keeps fraud results stored in 1.6 terabytes of TimesTen cache, in order to compare and then push back into the Oracle warehouse for long term storage and analysis."
So here you have it again, USPS does only do analytics on the TimesTen. TimesTen do not store persistent data (which means TimesTen doesn't have to maintain data integrity).
https://news.ycombinator.com/item?id=8175726
"....I'm not saying that Oracle hardware or software is the solution, but "scaling-out" is incredibly difficult in transaction processing. I worked at a mid-size tech company with what I imagine was a fairly typical workload, and we spent a ton of money on database hardware because it would have been either incredibly complicated or slow to maintain data integrity across multiple machines...."
.
Also, there are numerous articles where SGI says they try to get into the Enterprise market, but they can not do that with UV2000. The SGI representative mailed me just now:
>find case studies on
>http://www.sgi.com/company_info/resources/case_stu...
>customer testimonials on
>http://www.sgi.com/company_info/customers/testimon...
>customer success stories on YouTube
>https://www.youtube.com/playlist?list=PLT0g4VdghLM...
>
>SGI UV 2000
>http://www.sgi.com/products/servers/uv/uv_2000_20....
>Ist his sufficient? We don't have more information which is open for the public.
>
>the only "enterprise" customer use case I have already sent to you. The USPS case which runs greatly on Oracle.
>It was never our target to use UV2000 for enterprise. And for SAP HANA it runs very bad.
>I will ask our Chief Engineer if he can help you.
all these 46 "case studies" and "testimonials" on SGI website are exclusively talking about HPC scenarios. Not a single use case on SGI website is about Enterprise workloads. If you claim that UV2000 is good for Enterprise workloads, there should be at least a few customers doing enterprise workloads, right? But no one are. Why?
.
Apparently you have not really understood this thing about NUMA servers are clusters as you say "Then by your own definition, why are you citing 16 and 32 socket 'clusters' to run enterprise workloads? I heard from this guy Brutalizer that you can't run enterprise workloads on clusters."
So let me teach you as you have shown very little knowledge about parallel computations. All large servers are NUMA, which means some nodes far away have bad latency, i.e. some sort of a cluster. They are not uniform memory SMP servers. All large NUMA servers have differing latency. If you keep a server small, say 32-sockets, then worst case latency is not too bad which makes them suitable for scale-up workloads. Here we see that each SPARC M6 cpu are always connected to each other in 32-socket fashion "all-to-all topology". In worst case, there is one hop to reach far away nodes:
http://regmedia.co.uk/2013/08/28/oracle_sparc_m6_b...
"The SPARC M6-32 machine is a NUMA nest of [smaller] SMP servers. And to get around the obvious delays from hopping, Oracle has overprovisioned the Bixby switches so they have lots of bandwidth."
Here we see the largest IBM POWER8 server, it has 16-sockets and all are always connected to each other. See how many steps in worst case in comparison to the 32-socket SPARC server:
http://2eof2j3oc7is20vt9q3g7tlo5xe.wpengine.netdna...
However, if you go into the 100s of sockets realm (UV2000) then you can not use design principles like smaller 32-socket servers do, where all cpus are always connected to each other. Instead, the cpus in a cluster are NOT always connected to each other. For 256 sockets you would need 35.000 data channels, that is not possible. Instead you typically use lots of switches in a Fat Tree configuration, just like SGI does in UV2000:
clusterdesign.org/fat-trees/fat_tree_varying_ports/
As soon as a cpu needs to communicate to another, all the involved switches creates and destroys connections. For best case the latency is good, worst case latency is much worse though, because of all switches. A switch does not have enough of connections for all cpus to be connected to each other all the time (then you would not need switches).
As soon as you leave this all-to-all topology with low number of sockets, and go into heavily switched architecture that all large 100s of sockets HPC clusters use, worst case latency suffers and code that branches heavily will be severely penalized (just as SGI explains in that link I posted). This is why a switched architecture can not handle scale-up workloads, because worst case latency is too bad.
Also, the SGI UV2000 limits the bandwidth of the NUMAlink6 to 6.7 GB/sec - which does not cut it for scale-up business workloads. The scale-up Oracle M6-32 has many Terabytes of bandwidth, because SPARC M6 is an all-to-all topology, not a switched slow cluster.
http://www.enterprisetech.com/2013/09/22/oracle-li...
.
I said this many times in one way or another, to no avail. You just dont get it. But here is two month old link, where the CTO at SGI says the same thing; that UV2000 is not suitable for enterprise workloads. Ive posted numerous links from SGI where they say that UV2000 is not suitable for enterprise workloads. How many more SGI links do you want?
http://www.theplatform.net/2015/03/05/balancing-sc...
"...to better address the needs of these commercial customers, SGI had to back off on the scalability of the top-end UV 2000 systems, which implement...NUMA, and create [UV300H] that looks a bit more like a classic symmetric multiprocessing (SMP) of days gone by.
NUMA...assumes that processors have their own main memory assigned to them and linked directly to their sockets and that a high-speed interconnect of some kind glues multiple processor/memory complexes together into a single system with variable latencies between local and remote memory. This is true of all NUMA systems, including an entry two-socket server all the way up to a machine like the SGI UV 2000...
In the case of a very extendable system like the UV 2000, the NUMA memory latencies fall into low, medium, and high bands, Eng Lim Goh, chief technology officer at SGI, explains...and that means customers have to be very aware of data placement in the distributed memory of the system if they hope to get good performance (i.e. UV2000 is not a true uniform SMP server no matter what SGI marketing says).
“With the UV 300, we changed to an all-to-all topology,” explains Goh. “This was based on usability feedback from commercial customers because unlike HPC customers, they do not want to spend too much time worrying about where data is coming from. With the UV 300, all groups of processors of four talking to any other groups of processors of four in the system will have the same latency because all of the groups are fully connected.” (i.e. it looks like a normal Unix server where all cpus are connected, no switches involved)
SGI does not publicly divulge what the memory latencies are in the systems, but what Goh can say is that the memory access in between the nodes in the UV 300 is now uniform, unlike that in the UV 2000, but the latencies are a bit lower than the lowest, most local memory in the UV 2000 machine.
“It is not that the UV 300 is better than the UV 2000,” says Goh. “It is just that we are trading off scalability in the UV 2000 for the usability in the UV 300. When you do all-to-all topology in a UV 300, you give up something. That is why the UV 300 will max out at 32 sockets – no more. If the UV 300 has to go beyond 32 sockets – for instance, if SAP HANA makes us go there – we will have to end up with NUMA again because we don’t have enough ports on the processors to talk to all the sockets at the same time.”
.
(BTW, while it is true that Oracle charges more the more cpus your server have, the normal procedure is too limit the Oracle database to run on only a few cpus via virtualization to keep the cost down. Also, when benchmarking there are no such price limitations, they do their best to only want to grab the top spot and goes to great effort to do that. But there are no top benchmarks from UV2000, they have not even tried)
Kevin G - Friday, June 5, 2015 - link
@Brutalizer“Ok, I understand what you are doing. I have some really relevant questions […] and you either dont answer them or answer a totally another question. This makes the relevant questions disappear in the wall of text, which makes it easy for you to avoid these questions that really pin-point the issue. So in this post I have just three pin-point questions for you, which means you can not duck them any longer.”
Sure I can. :)
Your projection is strong as you cut out a lot of the discussion to avoid my points and answering my questions so I would consider cropping your request out of my reply. It would simply be fair play. Though I will give these a shot as I have asked several questions you have also dodged in return.
QA) “That x86 servers will not do? SAP and business software is very hard to scale, as SGI explained to you, as the code branches too much.” I’ve asked for this clarification on this several times before without a direct answer. What is the definition of code branching in the context of increasing and why does it impact scalability?
QB) You have asserted that OCC and MVCC techniques use locking to maintain concurrency when they were actually designed to be alternatives to locking for that same purpose. Please demonstrate that OCC and MVCC do indeed use locking as you claim.
http://en.wikipedia.org/wiki/Optimistic_concurrenc...
http://en.wikipedia.org/wiki/Multiversion_concurre...
QC) Why does the Unix market continue to exist today? Why is the Unix system market shrinking? You indicated that it not because of exclusive features/software, vendor lock-in, cost of porting custom software or RAS support in hardware/software. Performance is being rivaled by x86 systems as they’re generally faster(per SAP) and cheaper than Unix systems of similar socket count in business workloads.
“Q1) Would you consider this as spreading dubious or false information? If yes, how is this different from your claim that SGI UV2000 can handle scale-up business workloads better than high end Unix servers (without any links)? "SPARC M6-32 server is much faster than UV2000 in HPC calculations. I have no benchmarks to show, nor links to show. I emailed Oracle and they confirmed this. Which means this is a proven fact, and no speculation"”
Three points about this. First is the difference in claims. It is an *ability* of the UV 2000 to run scale up business workloads. The metric for this rather binary: can it do so? Yes or no? There is no other point of comparison or metric to determine that. You’ve indicated via your SGI contact above that the UV 2000 it is certified for Oracle and SAP (though not SAP HANA). I’ve posted links about SGI attempting to get it into the enterprise market as well as the USPS example. Thus the UV 2000 appears to have the ability to run enterprise workloads. We have both provided evidence to support this claim.
Secondly about the M6-32 hypothetical, this is a direct comparison. Performance here can be measured and claims can be easily falsified based upon that measurement. Data not provided can be looked up from independent sources to challenge the assertion directly. And lastly, yes, your hypothetical would fall under the pretense of dubious information as no evidence has been provided.
“Q2) Show us a link on a x86 server that achieves close to a million saps. You claim that x86 can tackle larger workloads (i.e. scales better) than Unix servers, so it should be easy for you to show links with x86 getting higher saps than Unix. Or is this claim also FUD?”
First off, there is actually no SAP benchmark result of a million or relatively close (+/- 10%) and thus cannot be fulfilled by any platform. The x86 platform has a score in the top 10 out of 792 results posted as of today. This placement means that it is faster than 783 other submissions, including some (but not all) modern Unix systems (submissions less than 3 years old) with a subset of those having more than 8 sockets.. These statements can be verified as fact by sorting by SAP score after going to http://global.sap.com/solutions/benchmark/sd2tier....
The top x86 score for reference is http://download.sap.com/download.epd?context=40E2D...
“Q3) Show us an example of UV2000 used for large enterprise scale-up workloads, such as SAP or Oracle database (not dataware house analytics, but a real database storing persistent data which means they have to maintain data integrity which is extremely difficult in HPC clusters)”
You’ve shifted the goal posts so many times that I think it is now worth documenting every time they have moved to put this question into its proper context. Your initial claim ( http://anandtech.com/comments/9193/the-xeon-e78800... ) was “...no one use SGI servers for business software, such as SAP or databases which run code that branches heavily.” For what the USPS example was a counter. You then assert ( http://anandtech.com/comments/9193/the-xeon-e78800... ) that since TimesTen is an in memory DB, that it cannot be used as a real database has to write to disk. Oddly, you have since provided a quote below that runs contrary to this claim. When we get to http://anandtech.com/comments/9193/the-xeon-e78800... the excuse about TimesTen is that it is only for analytics and thus not real database workloads (even though again the quote you provide below would satisfy the requirements you present in this post). When we get to http://anandtech.com/comments/9193/the-xeon-e78800... you re-assert the claim: “In Memory Databases often don't even have locking of rows, as I showed in links. That means they are not meant for normal database use. It is stupid to claim that a "database" that has no locking, can replace a real database.” So at this point a real database has to write to disk, can’t be used for analytics and has to use locking for concurrency. When we get to http://anandtech.com/comments/9193/the-xeon-e78800... a few posts later, apparently fraud detection is not an acceptable business case for the USPS: “No, this is wrong. USP use the UV2000 for analytics, not database work. The ram database is used for "fraud detection" as quoted from your link. A real database is used to store persistent data on disks, not in RAM.”
So yes, I have answered this question before you shifted the goal posts four different times. At this point to fit your shifted criteria, all the advantages to running a large database on the UV 2000 has been removed. The main feature of the UV 2000 isn’t the large socket count but the massive cache coherent shared memory capacity. 64 TB is large enough to hold the working set for many business applications, even in this era of big data. If that strength cannot be utilized, then your only other reason to get a UV 2000 would be a computationally bound, scale up problem for enterprise work (but can’t be analytics either!). You have successfully moved the goal posts to the point that there is no answer. To use an analogy it is like trying to find a car that can go 300 km/hr and 3L/100 km fuel efficiency but also has no wheels.
“This means that UV2000 compare each mail with a RAM database to see if this mail is fraudulent. In addition to the TimesTen memory database, there is also an Oracle 10G datawarehouse as a backend, for "long term storage" of data.
http://www.datacenterknowledge.com/archives/2013/0...
"[TimesTen] coupled with a transactional [Oracle] database it will perform billions of mail piece scans in a standard 15 hour processing day. Real-time scanning performs complex algorithms, powered by a SGI Altix 4700 system. Oracle Data Warehouse keeps fraud results stored in 1.6 terabytes of TimesTen cache, in order to compare and then push back into the Oracle warehouse for long term storage and analysis."”
Excellent! Now you finally understand that Times Ten database is used to actually write data directly to. The initial scan goes directly to the TimesTen database for the comparison with all other recent scans. This is the core function of their fraud detection system. The other key point in the quote you cite is transactional. This contradicts your earlier statements (http://anandtech.com/comments/9193/the-xeon-e78800... ): “… A real database acts at the back end layer. Period. In fact, IMDB often acts as cache to a real database, similar to a Data WareHouse. I would not be surprised if US Postal Service use TimesTen as a cache to a real Oracle DB on disk. You must store the real data on disk somewhere, or get the data from disk….”
Also things have changed a bit since that article was published. USPS has upgraded to a UV 1000 and then to a UV 2000 so it is two generations behind. The main thing that has changed is the amount of memory in the systems. At the time of the Itanium based Altix 4700, the data warehouse was 10 TB in size ( http://www.oracle.com/technetwork/products/timeste... ). I could not find any reference to the current size of the data warehouse or the rate that it increases, but their current UV 2000 with 32 TB of memory would at least be able to host their entire data warehouse from 2010 in-memory three times over with room to spare.
“So here you have it again, USPS does only do analytics on the TimesTen. TimesTen do not store persistent data (which means TimesTen doesn't have to maintain data integrity).”
TimesTen does indeed support concurrency and it even uses traditional locking to do it. https://docs.oracle.com/cd/E13085_01/timesten.1121...
“Apparently you have not really understood this thing about NUMA servers are clusters as you say "Then by your own definition, why are you citing 16 and 32 socket 'clusters' to run enterprise workloads? I heard from this guy Brutalizer that you can't run enterprise workloads on clusters."
Oh, I have understood. I was just pointing out your contractions here. The irony is delicious.
“So let me teach you as you have shown very little knowledge about parallel computations. All large servers are NUMA, which means some nodes far away have bad latency, i.e. some sort of a cluster. They are not uniform memory SMP servers. All large NUMA servers have differing latency. If you keep a server small, say 32-sockets, then worst case latency is not too bad which makes them suitable for scale-up workloads. “
The differing point between a cluster and a scale up server you appear to be arguing for is simply the latency itself. While having lower latency is ideal for performance, latency can be permitted to increase until cache coherency is lost. The UV 2000 supports cache coherency up to 256 sockets and 64 TB of memory.
A cluster on the other hand does not directly share memory with each node being fully independent and linked via a discrete networking layer. This adds complexity for programming as coherency has to be added via software as there is no hardware support for it over a network. The network itself is far higher latency connection between nodes than what CPUs use directly in hardware (socket to socket latencies are measured in nanoseconds where as Ethernet is measured in microseconds) and time to handle the software layer that doesn’t exist between CPUs in a scale up system.
“Here we see that each SPARC M6 cpu are always connected to each other in 32-socket fashion "all-to-all topology". In worst case, there is one hop to reach far away nodes:”
That graphic is actually a mix of one, two and three hop connections. Going from socket 8 to 15 is a single hop as those sockets are directly connected with nothing in between them. Socket 8 to 1 takes two jumps as a Bixby interconnect chip is between them. Socket 8 to 23 requires three hops: first to a Bixby chip (BX0, BX2, BX4, BX6, BX8 or BX10), then to a another socket (either 16, 17, 18, or 19) and then finally to socket 23. For a 32 socket topology, this is actually pretty good, just not what you are claiming.
“Here we see the largest IBM POWER8 server, it has 16-sockets and all are always connected to each other. See how many steps in worst case in comparison to the 32-socket SPARC server:”
The picture you cite is a bit confusing as it follows the cable path (IBM puts two links through a single external cable). This page contains a better diagram for the raw logical topology:
http://www.enterprisetech.com/2014/07/28/ibm-forgi...
This is rather clear that the topology is a mix of single and two hop paths: each socket is connected directly to every other socket in the same drawer and to one socket in every external drawer. Thus the worst case is two hope via need to move both within a drawer and then externally.
“However, if you go into the 100s of sockets realm (UV2000) then you can not use design principles like smaller 32-socket servers do, where all cpus are always connected to each other. Instead, the cpus in a cluster are NOT always connected to each other. For 256 sockets you would need 35.000 data channels, that is not possible. Instead you typically use lots of switches in a Fat Tree configuration, just like SGI does in UV2000”
First off, far your examples so far don’t actually show all the socket connected directly to each other.
Secondly, your math on scaling out 256 via mesh topology does not require 35,000 links. The correct answer is 32,640. Still a lot but an odd mistake for some one who claims to have a masters in mathematics.
Lastly, the UV 2000 can be configured into a hypercube as your own SGI contact indicated. Worst case scenario is 5 hops on a 256 socket system. Those extra hops do add latency but not enough to break cache coherency between all the sockets.
“As soon as a cpu needs to communicate to another, all the involved switches creates and destroys connections. For best case the latency is good, worst case latency is much worse though, because of all switches. A switch does not have enough of connections for all cpus to be connected to each other all the time (then you would not need switches).”
Correct and this is what happens with both the Bixby and NUMALink6 interconnects. You’ve provide this link before and it indicates that Bixby is a switch. “The Bixby coherence-switch chips hold L3 cache directories for all of processors in a given system, and a processor doing a memory request has to use the CLs to find the proper processor SMP group in the system, and then the processor socket in the SMP group that has the memory it needs.” http://www.theregister.co.uk/2013/08/28/oracle_spa...
“As soon as you leave this all-to-all topology with low number of sockets, and go into heavily switched architecture that all large 100s of sockets HPC clusters use, worst case latency suffers and code that branches heavily will be severely penalized (just as SGI explains in that link I posted). This is why a switched architecture can not handle scale-up workloads, because worst case latency is too bad.”
Cache coherent switching between processor sockets does indeed at latency which in turn lowers performance. Both NUMALink6 and Bixby handle this well and are capable of maintain coherency even with additional switching tiers adding latency.
“Also, the SGI UV2000 limits the bandwidth of the NUMAlink6 to 6.7 GB/sec - which does not cut it for scale-up business workloads. The scale-up Oracle M6-32 has many Terabytes of bandwidth, because SPARC M6 is an all-to-all topology, not a switched slow cluster.”
That is 6.7 GB per link on the UV 2000. That 3 TB/s of bandwidth being quoted in the article is the aggregate of the links not an individual link. The Bixby chips are switches and provide 12 GB per link according to http://www.theregister.co.uk/2013/08/28/oracle_spa...
“I said this many times in one way or another, to no avail. You just dont get it. But here is two month old link, where the CTO at SGI says the same thing; that UV2000 is not suitable for enterprise workloads. Ive posted numerous links from SGI where they say that UV2000 is not suitable for enterprise workloads. How many more SGI links do you want?”
And this has been covered before. The reason to scale back the socket count from the UV2000 to UV300 is to provide a more uniform latency. The main reasons to pick the UV 2000 over the UV 300 is if you genuinely need the extra performance provided by the extra sockets and/or more than the 48 TB of memory which the UV 3000 can be configured with. That is a very narrow gap for the UV 2000 in the enterprise today. Though before the UV 300 launch, the UV 2000 filled that same role. Though to answer our question, I wouldn’t a few more links as they tend to run counter to the claims you’re trying to make.
“NUMA...assumes that processors have their own main memory assigned to them and linked directly to their sockets and that a high-speed interconnect of some kind glues multiple processor/memory complexes together into a single system with variable latencies between local and remote memory. This is true of all NUMA systems, including an entry two-socket server all the way up to a machine like the SGI UV 2000...”
You should have finished the last sentence in that quote: “…which has as many as 256 sockets and which is by far the most scalable shared memory machine on the market today.” I can see why you removed this part as counters several points you are trying to make in our discussion.
“In the case of a very extendable system like the UV 2000, the NUMA memory latencies fall into low, medium, and high bands, Eng Lim Goh, chief technology officer at SGI, explains...and that means customers have to be very aware of data placement in the distributed memory of the system if they hope to get good performance (i.e. UV2000 is not a true uniform SMP server no matter what SGI marketing says).”
Data placement is nothing new to NUMA: by performing the calculations on the cores closest to where the data resides reduces the number of remote memory accesses. This is nothing new and applies to all NUMA systems. It is more important on the UV 2000 as the worst case scenario is 5 hops at 256 socckets vs. 3 hops for your M6-32 example.
“With the UV 300, we changed to an all-to-all topology,” explains Goh. “This was based on usability feedback from commercial customers because unlike HPC customers, they do not want to spend too much time worrying about where data is coming from. With the UV 300, all groups of processors of four talking to any other groups of processors of four in the system will have the same latency because all of the groups are fully connected.” (i.e. it looks like a normal Unix server where all cpus are connected, no switches involved)
If you actually understood that quote, there are still switches involved. Just a single tier of them instead of the 3 tiered design in the 256 socket UV 2000.
“(BTW, while it is true that Oracle charges more the more cpus your servers have, the normal procedure is too limit the Oracle database to run on only a few cpus via virtualization to keep the cost down. Also, when benchmarking there are no such price limitations, they do their best to only want to grab the top spot and goes to great effort to do that. But there are no top benchmarks from UV2000, they have not even tried)”
However, the Oracle’s fees and similar models for enterprise software is a major reason why UV 2000 installations for such software is quiet rare. It is not because the software couldn’t run on the UV 2000 (you did cut discussion about software scalability which in fact could have a real limit). Regardless, software licensing costs is a major reason to avoid such a massive system. It would generally be wiser to get a lower socket/core count system similar to the number of cores that you can afford via licensing. This is why Oracle has high hopes for the UV 300 as the ceiling for software licensing fees is far lower than a UV 2000 with similar memory capacity up to 48 TB.
Brutalizer - Thursday, June 11, 2015 - link
Im on vacation and only have an ipad, so I can not write long posts. ill be home 15 july. However, i notcie you did duck my questions Q1 -3) again, by you shift goal posts. For instance, you say that x86 scales better on sap, and provide no proof of that, instead you duck the question by saying x86 has top ten. So where is the proof x86 tackles larger sap workloads? Nowhere. Does it tackle larger sap workloads? According to you: yes. Show us proof, or is x86 unable to tackle larger sap workloads, ie FUD as usual?.
You also say that x86 can handle businees scale-up workloads better than unix: your proof of this? No links/benchmarks. Instead you explain to all readers here that: "x86 appears to have the ability to run enterprise workloads". Ergo, x86 handles enterprise workloads better than unix. Et voila. This so un academic I dont know where to start. You convince no one with that explanation. Really stupid this. Can you explain AGIN to all of us here?
Here in london it says in a store: "to buy some wares you need to be 25 years". Imagine Fuder KevinG reply
-i am older than 25, i emailed them at HP, and they concur i am 25 years. You have to trust me, i am not showing you any id card or any other verification of this claim.
What do you think the policeman would say? Let you go? Is this argument of yours, credible? Have you verified you are 25 years? No. No proof. Just fud.
And regarding the sgi link, he says that sgi uv2000 is not suitable to enterprise, so he talks about uv300h being better. You dont believe the cto of sgi. Of course, with logic such as yours, you dont know what a credible explanation is. That is the reason you believe you have proved uv2000 handles better workloads (it appears to do so) and that is why you dont believe cto sgi when he recommends uv300h instead. Your logic is beyond repair, you dont know what an academic debate is, with logically sound arguments and counter arguments. You have no clue of academic debate. That is clear. You dont convince the police, nor a researcher.
You are welcome to answer questions 1-3) again unfil i get home.
Brutalizer - Thursday, June 11, 2015 - link
I mean, ille be home 15 june. Btw, sgi sales rep has not answered me more. He aksed why i use a weird email adress, and got suspicious. "What is this all about??? Why are you asking these questions??"Kevin G - Thursday, June 11, 2015 - link
@Brutalizer"Im on vacation and only have an ipad, so I can not write long posts. ill be home 15 july. "
So? I'm composing this while waiting at an airport using an iPhone. I look forward to your response on my three questions and how you misinterpreted the socket topologies in the links you provided. You would want to dodge that part now would you?
"However, i notcie you did duck my questions Q1 -3) again, by you shift goal posts. For instance, you say that x86 scales better on sap, and provide no proof of that, instead you duck the question by saying x86 has top ten.So where is the proof x86 tackles larger sap workloads?"
You seem to be missing the point that a top ten SAP score does indeed indicate that it can handle large workloads. The question Q1 you presented was about capability and differences in the claims. The fact x86 ranks in SAP's tip 10 should be sufficient evidence enough on this point. The UV 2000 is certified by Oracle to run there database, which would also be evidence that the system is capable of running running database workloads. The answer to Q1 was accurate despite your attempts to shift the question again here.
"You also say that x86 can handle businees scale-up workloads better than unix: your proof of this? No links/benchmarks. Instead you explain to all readers here that: "x86 appears to have the ability to run enterprise workloads". Ergo, x86 handles enterprise workloads better than unix. Et voila. This so un academic I dont know where to start. [...]"
Your initial request about this before you shifted the goal posts was to find a customer that replaced a Unix system with a 16 socket x86 system for business workloads. So I found such an installation at Cerner per your request. This should be proof enough that x86 based can replace large Unix systems because businesses are doing exactly that.
"And regarding the sgi link, he says that sgi uv2000 is not suitable to enterprise, so he talks about uv300h being better. You dont believe the cto of sgi. Of course, with logic such as yours, you dont know what a credible explanation is. That is the reason you believe you have proved uv2000 handles better workloads (it appears to do so) and that is why you dont believe cto sgi when he recommends uv300h instead. "
I believe that I stated myself that the UV300 would be a better system in most situations today as it supports up to 48 TB of memory, 3/4 the amount that the UV2000 supports. It does so at a lower socket count which saves money on software licenses and the socket topology offer better latency for better performance up to 32 sockets. As such you, me, and SGI are in lockstep with the idea that in most cases the UV 300 would be a better choice in the majority of use-cases. I also listed the two niche exceptions: the need for more than 48 TB of memory and/or more performance than what the UV 300 offers in 32 sockets.
The link you presented ( http://www.theplatform.net/2015/03/05/balancing-sc... ) does not actually say that UV 2000 cannot run business workloads rather that UV 300 would be better at them (and the better part we agree on as well as why the UV 300 would be better). You just can't realized that the UV 2000 is also capable of running the exact same software.
Now go back a year when the UV 300 wasn't on the market. You have interviews like this which the historical context has to be the UV 2000 the UV 300 wasn't on the market yet: http://www.enterprisetech.com/2014/03/12/sgi-revea...