Agner Fog, a Danish expert in software optimization is making a plea for an open and standarized procedure for x86 instruction set extensions. Af first sight, this may seem a discussion that does not concern most of us. After all, the poor souls that have to program the insanely complex x86 compilers will take care of the complete chaos called "the x86 ISA", right? Why should the average the developer, system administrator or hardware enthusiast care?

Agner goes in great detail why the incompatible SSE-x.x additions and other ISA extensions were and are a pretty bad idea, but let me summarize it in a few quotes:
  • "The total number of x86 instructions is well above one thousand" (!!)
  • "CPU dispatching ... makes the code bigger, and it is so costly in terms of development time and maintenance costs that it is almost never done in a way that adequately optimizes for all brands of CPUs."
  • "the decoding of instructions can be a serious bottleneck, and it becomes worse the more complicated the instruction codes are"
  • The costs of supporting obsolete instructions is not negligible. You need large execution units to support a large number of instructions. This means more silicon space, longer data paths, more power consumption, and slower execution.
Summarized: Intel and AMD's proprietary x86 additions cost us all money. How much is hard to calculate, but our CPUs are consuming extra energy and underperform as decoders and execution units are unnecessary complicated. The software industry is wasting quite a bit of time and effort supporting different extensions.
 
Not convinced, still thinking that this only concerns the HPC crowd? The virtualization platforms contain up to 8% more code just to support the incompatible virtualization instructions which are offering almost exactly the same features. Each VMM is 4% bigger because of this. So whether you are running Hyper-V, VMware ESX or Xen, you are wasting valuable RAM space. It is not dramatic of course, but it unnecessary waste. Much worse is that this unstandarized x86 extention mess has made it a lot harder for datacenters to make the step towards a really dynamic environment where you can load balance VMs and thus move applications from one server to another on the fly. It is impossible to move (vmotion, live migrate) a VM from Intel to AMD servers, from newer to (some) older ones, and you need to fiddle with CPU masks in some situations just to make it work (and read complex tech documents). Should 99% of market lose money and flexibility because 1% of the market might get a performance boost?

The reason why Intel and AMD still continue with this is that some people inside feel that can create a "competitive edge". I believe this "competitive edge" is neglible: how many people have bought an Intel "Nehalem" CPU because it has the new SSE 4.2 instructions? How much software is supporting yet another x86 instruction addition?
 
So I fully support Agner Fog in his quest to a (slightly) less chaotic and more standarized x86 instruction set.
Comments Locked

108 Comments

View All Comments

  • Scali - Monday, December 14, 2009 - link

    Itanium isn't dead. Intel still has 'big' plans for it.
    They want to skip a node and get Itanium on the same manufacturing process as x86, and they want to use the same sockets/chipsets/motherboards on both x86 and Itanium servers.
    I think Itanium may temporarily have gotten less attention, when Intel needed to turn the Pentium 4 into the current Core2 and Core i7 success.

    Binary emulation was applied even in the early Alpha versions of NT4, to run x86 software, as I said. So it has been possible for more than a decade. The main problem back then was that Alpha was aimed at corporate servers and workstations (much like Itanium), so it wasn't affordable to regular users.
    But Alpha/Itanium could have trickled down into the consumer market eventually.
  • Penti - Monday, December 14, 2009 - link

    You know by who? Digital that's right not microsoft, and it's Intel who supplies it for Itanium too. Microsoft didn't them self support many of the platforms they release NT on. That was my point. That's also why it didn't catch on. MS didn't put in any real effort to support them. They wanted to skip nodes? No 65 or 45nm has been released. That's right but the development has still stagnated both software and hardware wise, it didn't became the pa-risc replacement for HP, it didn't became the true mid/high end server replacement for Intel. So it's pretty dead, there's no reason to run the Windows Itanium version today. Tukwila will still be a 65 nm chip though, it's improvement to the memory controller that has taken time, that chip will be released next year, later they will jump to 32nm but that's not relevant in the observation that it's been years since a product release. As regards to binary emulation/translation it wouldn't been the right time for Apple to do it in 2001, that's all I said. It's much more mature today anyway. Any how by the time 32nm Itaniums arrive we will have Sandy bridge coming out. Any how you where mistaken about the current situation. The 65nm Tukwila delay has nothing to with what you mentioned. Also FX!32 hardly lived two years. Then MS dropped the Alpha port. Together with HP dropping Alpha support. Intels & HPs Itanium didn't replace both Alpha, PA-RISC and x86 parts. x86 did again catch on, development stagnated for Itanium as said so it could probably be regarded as pretty dead. Theirs still reasons for buying UltraSPARC and POWER-machines, but I fail to see the benefits of Itanium today. That's because it did fail to replace PA-RISC for HP and did fail to become a high-end windows server platform. The number of Itanium systems that is sold are maybe 60 000 - 80 000 a year! Most aren't with Windows. HP stands for 95% of the sales.
  • Scali - Tuesday, December 15, 2009 - link

    Get a grip, mate :)
    Use some proper interpunction :)
  • Penti - Tuesday, December 15, 2009 - link

    Irrelevant. But yeah it was drawn out to long anyhow. But your not responding to the point anyway.
  • Penti - Tuesday, December 15, 2009 - link

    Well lets do it this way.

    1. Digital made and released FX!32 binary-emulation+translation, Digital pretty much supported NT themselves.

    2. Tukwila - 65nm part was delayed and it's has nothing to do with jumping a node. It's delayed because of the work involved in the memory controller and QPI architecture. Poulson will be 32nm and be released in 2011. Tukwila will be a 4 core 65nm chip. It will use same chipset as (Nehalem-EX) Xeon MP platform.

    3. New products in the Itanium line aren't really coming out as expected.

    4. x86 has cough up the the point where Itanium actually need to significantly catch up with x86 development. There's no evident RAS advantages. Currently it's aged bad.

    5. Itanium failed the be a unified RISC/EPIC product. Windows on Itanium didn't catch on and HP-UX development stagnated to the point where most migrated. Most didn't move away from PA-RISC machines either and continued buying them till the end. Shops are replacing DB/MS SQL Server Itanium machines with x86 ones now.

    6. Virtually only HP sells Itanium machines. Linux and HP-UX sells better then Windows Server.

    x86 did catch on before, and did it again. It's where the development goes and it can reinvent itself fine regardless of the need for backward compatibility. Emulation won't provide full compatibility, requires much work and will be slow painfully so if just running in "emulation mode". FX!32 created a kind of native code from the first time you started an app. You can always do like Transmeta and create a special architecture for running ISA emulation on but theres no appearant benefit.

    Finally operating system vendors have been poor at supporting or writing this kind of Emulation, I mentioned Digital and Intel here. The OS vendors didn't support it, Apple brought in Rosetta from outside the company that's why they couldn't have done something like it before. Needless to say Rosetta didn't provide a fully compatible environment for the applications. Microsoft didn't them self support the endeavors into other platforms therefor. And that contributed to the failure. MIPS was killed pretty early on. There where no binary translator there. Of course today you can always do full systems emulation like QEMU or even bochs. VPC worked kinda. But it's painful so most would rather skip it.
  • Scali - Wednesday, December 16, 2009 - link

    I don't feel like covering all these points, as you seem to be looking for an argument more than that you are directly arguing what I said.

    But I do want to point out that Apple's Rosetta was not the first, but the SECOND time that they used emulation to cover a transition in CPU ISA. The first time obviously being the move from 68k to PPC.
    I don't think it's relevant whether the OS vendor supplies this solution, or if it's supplied by a third party, as long as it works.
  • ProDigit - Wednesday, December 9, 2009 - link

    With above post I had hoped that it would have been possible to run Windows on ARM technology, while regular programs for windows continue to function just like they do today!

    But I guess now, that THAT would be pretty impossible, unless you're running these programs in a virtual platform.

    Sometimes I guess solutions are not as simple as we think they are!
  • rs1 - Tuesday, December 8, 2009 - link

    I don't think having a standardized procedure for x86 instruction set extensions would improve upon any of the issues that Agner raises. For instance, he cites the following:

    -- "The total number of x86 instructions is well above one thousand" (!!)

    And if there were a standardized method for adding instructions, then there would likely be just as many, if not more. Having a standard procedure for adding instructions to the x86 instruction set doesn't mean that people are going to stop doing it.

    -- "CPU dispatching ... makes the code bigger, and it is so costly in terms of development time and maintenance costs that it is almost never done in a way that adequately optimizes for all brands of CPUs."

    You have to deal with this whether or not there is a standard procedure for extending the x86 instruction set. The only way to avoid it would be to either start working with something other than x86, or reduce the size of the existing x86 instruction set, and then disallow future additions.

    -- "the decoding of instructions can be a serious bottleneck, and it becomes worse the more complicated the instruction codes are"

    And again this issue still needs to be dealt with either way. Having a standard procedure for adding a new instruction to the ISA doesn't mean that the instruction being added is going to be any less complex to decode.

    -- The costs of supporting obsolete instructions is not negligible. You need large execution units to support a large number of instructions. This means more silicon space, longer data paths, more power consumption, and slower execution.

    While true, this has more to do with the nature of x86 itself. Having a standard way to add new instructions doesn't negate the need to preserve backwards compatibility.


    It seems to me that what Agner really wants, or at least, the argument that the points he brings up support, is to replace x86 with a RISC-style ISA. Having a standard way to add new instructions into x86 changes nothing fundamental about the ISA and the pros and cons that go along with it. And truly addressing the issues that Agner raise would require such fundamental changes to the ISA that there'd be no point in calling it "x86" any more at that point.

    Of course, I think having standards in place regarding adding extensions to the x86 ISA is a fine idea, but it is definitely not going to fix any of the issues that Agner raised. You'd need to switch to an entirely different ISA to do that.
  • Agner - Monday, December 28, 2009 - link

    Thank you everybody for discussing my proposal.

    rs1 wrote:
    >-- "The total number of x86 instructions is well above one thousand"
    >And if there were a standardized method for adding instructions, then
    >there would likely be just as many, if not more.
    Please read my original blog post: http://www.agner.org/optimize/blog/read.php?i=25">http://www.agner.org/optimize/blog/read.php?i=25 I have argued for an open committee that could approve new instructions and declare other instructions obsolete. There would be fewer redundant instructions if all vendors were forced to use the same instructions. For example, we wouldn't have FMA3 on Intel and FMA4 on AMD. And we wouldn't have new instructions that are added mainly for marketing reasons.

    >It seems to me that what Agner really wants, or at least, the argument that
    >the points he brings up support, is to replace x86 with a RISC-style ISA.
    No, I never said that. Itanium failed because the market wants backwards compatibility. And the CISC instruction set has the advantage that it takes less space in the instruction cache.

    >-- "CPU dispatching ...
    >You have to deal with this whether or not there is a standard
    >procedure for extending the x86 instruction set.
    You would certainly need fewer branches; and it would be easier to test and maintain because all branches could be tested on the same machine.

    >-- "the decoding of instructions can be a serious bottleneck, ...
    >And again this issue still needs to be dealt with either way.
    >Having a standard procedure for adding a new instruction to the
    >ISA doesn't mean that the instruction being added is going to be
    >any less complex to decode.
    The current x86 encoding is the result of a long history of short-sighted patches rather than long-term planning. That is the reason why the decoding is so complicated. We need better planning in the future.

    >-- The costs of supporting obsolete instructions is not negligible...
    >While true, this has more to do with the nature of x86 itself.
    >Having a standard way to add new instructions doesn't negate the
    >need to preserve backwards compatibility.
    I have argued that it is impossible to remove obsolete instructions in the current situation for marketing reasons. But an open committee would be able to declare that for example the x87 instructions are obsolete and may be replaced by emulation after a number of years.

  • jabber - Tuesday, December 8, 2009 - link

    I gave up caring about them in the late 90's after all the excitiment of MMX transpired into ...well nothing really.

Log in

Don't have an account? Sign up now