Intel Lunar Lake: New P-Core, Enter Lion Cove

Diving straight into the Performance, or P-Core commonly referred to, has had major architectural updates to increase power efficiency and performance. Bigger of these updates, Intel needed to comprehensively update its classic P-core cache hierarchy.

Key among these improvements is a significant overhaul of Intel's traditional P-core cache hierarchy. The fresh design for Lion Cove uses a multi-tier data cache containing a 48KB L0D cache with 4-cycle load-to-use latency, a 192KB L1D cache with 9-cycle latency, and an extended L2 cache that gets up to 3MB with 17-cycle latency. In total, this puts 240KB of cache within 9 cycles' latency of the CPU cores, whereas Redwood Cove before it could only reach 48KB of cache in the same period of time.

The data translation lookaside buffer (DTLB) has also been revised, increasing its depth from 96 to 128 pages to improve its hit rate.

Intel has also added a third Address Generation Unit (AGU)/Store Unit pair to further boost the performance of data write operations. Intel has also thrown more cache at the problem, and as CPU complexity grows, so does the reliance on the cache subsystems to keep them fed. Intel has also reworked the core-level cache subsystem by adding an intermediate data cache (IDC) between the 48 KB L1 and the L2 level. The original L1D cache is now called the L0 D-cache internally and retires to a 192 KB L1 D-cache.

The latest Lion Cove P-core design also includes a new front-end for handling instructions. The prediction block is 8x larger, fetch is wider, decode bandwidth is higher than on Raptor Cove, and there has been an enormous increase in Uops cache capacity and read bandwidth. The change in Uop queue capacity is designed to enhance the overall performance throughput.

The out-of-order engine in Lion Cove is partitioned in the footprint for Integer (INT) and Vector (VEC) domains Execution Domain with Independent renaming and scheduling. This type of partitioning allows for expandability in the future, independent growth of each domain, and benefits toward reduced power consumption for a domain-specific workload. The out-of-order engine is also improved, going from 6 to 8-wide allocation/rename and 8 to 12-wide retirement, with the deep instruction window increased from 512 to 576 entries and from 12 to 18 execution ports.

Lion Cove's integer execution units have also been improved over Raptor Cove, with execution resources grown from 5 to 6 integer ALUs, 2 to 3 jump units, and 2 to 3 shift units. Scaling from 1 to 3 units, these multiply 64x64 units to 64, which takes 3 units and gives even more compute power for the harder part of computation. Another significant development is transforming the P-core database from a 'sea of fubs' to a 'sea of cells.' This process of migrating the sub-organization of the P-cores structure from fubs to more organized cells essentially increases the density.

Intel has removed Hyper-Threading (HT) from their Lunar Lake SoC, with one potential reason being to enhance power efficiency and single-thread performance. By eliminating HT, Intel reduces power consumption and simplifies thermal management, which should extend battery life in ultra-thin notebooks. Intel does make a couple of claims regarding the Lion Cove P-cores, which are set to offer approximately 15% better performance-to-power and performance-to-area ratios than cores with HT. Intel's hybrid architecture, which effectively utilizes E-cores for multi-threaded tasks, reduces the need for HT, allowing workloads to be distributed more efficiently by the Intel Thread Director.

Power management has also been refined by including AI self-tuning controllers to replace the static thermal guard bands. This lets the system respond dynamically to real-time operating conditions in an adaptive way to achieve higher sustained performance. Intel also implements Lion Cove P-Core clock speeds at tighter 16.67MHz intervals rather than the traditional 100MHz. This means more accurate power management and finer tuning to squeeze as much from the power budget as possible.

Intel's Lion Cove P-Core microarchitecture looks like a nice upgrade over Golden Cove. Lion Cove incorporates improved memory and cache subsystems and better power management while not relying solely on opting for faster P-core frequencies to boost the IPC performance.

Intel Unveils Lunar Lake Architecture: Overview Intel Lunar Lake: New E-Core, Skymont Takes Flight For Peak Efficiency
Comments Locked

91 Comments

View All Comments

  • The Hardcard - Wednesday, June 5, 2024 - link

    There will be Lion Cove with hyperthreading. It is designed such that it can be physically left out or included in depending on the value to each product.

    It was left out of Lunar Lake as the primary goal here is performance per watt and battery life superiority over Apple and Qualcomm.

    Server Lion Cove will absolutely have hyperthreading. Rumors are Arrow Lake will have it as well.
  • TMDDX - Wednesday, June 5, 2024 - link

    Is on chip "AI" the new connected standby for NSA spying?
  • ballsystemlord - Wednesday, June 5, 2024 - link

    Shhhhh, you're not supposed to say that. It's classified. ;)
  • sharath.naik - Wednesday, June 5, 2024 - link

    So would this have on package memory, what is the size of memory? how many P cores how many E cores? So many questions no answers. Is this like a paper launch?
  • sharath.naik - Wednesday, June 5, 2024 - link

    Never mind I was wrong. 4E+4P and up to 32 GB RAM. I wish they had option for 64GB, but 32GB is a good number
  • stephenbrooks - Wednesday, June 5, 2024 - link

    The wider Lion Cove core looks pretty impressive, I'll be interested to see how it does in desktops.
  • name99 - Wednesday, June 5, 2024 - link

    "In total, this puts 240KB of cache within 9 cycles' latency of the CPU cores"

    Does it? If they do things the usual Intel way the L1 is inclusive of the L0...
    Other options are possible, of course, but were they implemented?
  • mode_13h - Thursday, June 6, 2024 - link

    I wonder if the tag RAM for the L0, L1D, and L2 are all separate? It would be interesting if they grouped it all together in a tree-structured lookup and put that as close as possible to the core's load/store unit. The actual data memory of the caches could be the only part that's physically separate.
  • Bruzzone - Wednesday, June 5, 2024 - link

    It's worth the wait to Lunar and Arrow?
    Or take advantage of the Intel and AMD current generation clearance sales?

    Intel is flooding the channel with Raptor desktop and mobile in the last eight weeks apparently to sustain a Core supply bridge' into Lunar and Arrow. Intel is also sucking the financial capital out of the channel in an effort to block or slow the procurement of anything other than Intel.

    In parallel fighting it out for surplus control, AMD is also engaged sucking financial capital out of the channel by flooding the channel specifically with Raphael desktop.

    Where Meteor Lake and AMD Phoenix, Hawks and Granite Ridge continue as intermediate 'Al' technologies into Strix mobile and Arrow desktop. Not that I care about AI functionality currently.

    14th desktop channel available + 98% in the prior eight weeks
    13th desktop + 24.6%
    12th desktop + 33.4%

    Intel desktop all up;

    14th desktop available today = 24.9%
    13th desktop = 37% that is 48.4% more than 14th
    12th desktop = 37.9% equivalent with 13th

    Specific Intel mobile;

    Intel Meteor Lake mobile channel available gains + 216%. Within Meteor Lake Core SKUs are 10.3%. Among total, H performance mobile = 43.9% and U low power mobile = 56%. Meteor Lake associated are 11% of all Raptor Lake 13th mobile.

    14th mobile H + 16% in week and 30% of all Meteor and 36% of all 13th Raptor mobile H.
    13th mobile itself gains + 5.1%
    13th H specifically gains + 8.6%
    13th P clears down < 3.2%
    13th U gains + 4.8%

    12th Alder mobile all up + 13.2% in the prior eight weeks
    12th H specifically = flat
    12th P clears down < 3.2%
    12th U clears down < 2.6%

    I will have AMD desktop and mobile supply, trade-in and sales trend up later today at my SA comment line. Here are some immediate observations;

    5900XT and 5800XT on AMD so said pricing is sufficient to push Vermeer channel holdings down in price at so said $359 and $249 now pulled by AMD in the moment. The channel might not have been happy with that regulating price move on how much R5K there is too clear from the channel. R5K channel available is up + 68% since March 9 when R5K was 68% of all R7K and today 98% of R7K available.

    R7K desktop since March 9 channel supply volume available + 18%. R9K will minimally dribble out allowing R7K and R5K to clear? R9K might have to be priced up on specific SKUs to accomplish the same dribbling out objective allowing AMD back generation to clear?

    Notably 3600 gains in the channel + 94% in the prior month.
    3600X came back to secondary resale + 35%.
    3700X is up + 15.8% that's all trade-in.

    AMD might have to adjust R9K desktop top SKU and R5K desktop regulating SKUs not to interfere with the channel's ability to liquidate especially Vermeer from channel inventory holdings plus R7K SKUs that will follow in a first in first out channel sales system.

    In summary, there is plenty of Intel and AMD product in the channel. The PC market remains in a downward deflationary price spiral until at least q1 2025 aimed to clear existing inventories for channel financial reclaim to buy next generation.

    Subsequently there's this inventory bridge to traverse to Intel and AMD next generation products and through the summer into q4 it's never been a better time to buy a PC. I don't think desktop and mobile prices will be as low as they are heading into year end and for a long time following.

    For Intel at least flooding the channel with product indicates Intel is buying time.

    mb
  • BushLin - Wednesday, June 5, 2024 - link

    Thanks for the uncited nonsense Mike, we were all on tenterhooks.

Log in

Don't have an account? Sign up now