I haven't viewed an Intel Developer Forum with anticipation for some years. I am looking forward to this one, because unless there is some surprise afoot, this is where the Nehalem architecture should make its silicon debut. Intel tipped this by announcing the name of its first incarnation of Nehalem, a desktop chip dubbed "Core i7." Desktop CPUs tend to leave out features touted in literature describing the most potent implementation of a new architecture, so I don't expect Core i7 to embody Nehalem as IT will come to know it. I do expect to see Nehalem in production ahead of schedule, and that suits me. Nehalem could mark a return to a strategy that takes competition into account, and which includes entry-level RISC in the scope of competitors.
Intel left a lot on the cutting room floor when it made the transition from Netburst (Pentium 4) to Core Microarchitecture. Most of it belonged there. The Nehalem 45 nanometer x86 CPU is awash in features that Intel sidetracked when Opteron caught it off-guard. It brings in Level 3 cache, on-die memory controllers, 128-bit math, string processing instructions, and a scalable, point-to-point, hot-pluggable bus. It also resurrects Hyper-Threading (HT), the healthy baby thrown out with the muddy bath water when Intel dumped Netburst.
At the time HT hit the scene, it bore the trappings of an academic curiosity. Even Intel didn't know what to do with it: It delivered Netburst CPUs that had HT but which disabled it. Intel gave OEMs the option of leaving the on/off switch out of their BIOS settings. That, and the broad perception that HT had a negative impact on performance made HT seem like a failed experiment. I never saw it that way. Any feature that gives developers the capability to boost performance by 20 to 30 percent with minor modifications to code is one that needs to be in the CPU. Now that we're in the multi-core era, HT is exciting. Sun successfully made the case for hardware thread acceleration, which it branded CoolThreads, on a grand scale with UltraSPARC T1 ("Niagara"). Niagara planted 32 threads in a 72 watt (typical) package, and then outdid itself in UltraSPARC T2 with 64 threads in a 95 watt chip. Sun might have caught Intel's attention by setting world records for single-chip performance with SPEC integer and floating-point benchmarks that do threading at a process level. Intel's HT delivers two threads per core, but in a four core package, that's eight threads in hardware. In a basic two-socket rack server, Nehalem will do sixteen threads in hardware. It's a start.
What's to get excited about? HT brings an additional layer of affinity to performance-sensitive applications, and to OSes with schedulers that are smart enough to tell the difference between a thread unit and a physical core. HT by itself might not get your blood pumping, but Nehalem mates Hyper-Threading with on-chip memory controllers, dedicated Level 2 cache for each core, and Intel's first ever non-shared x86 bus architecture. Intel is also trying its hand at a feature AMD teased but hasn't yet implemented, the purpose-specific accelerator. These strike the public as obscure, but they get gearheads like me worked up. String matching and CRC (cyclical redundancy check, the ubiquitously-implemented method for assuring data integrity) in single instructions make for more compact code, but more than that, they get developers thinking about what else Intel might accelerate at the instruction level.
Intel's openness to input from developers may be Nehalem's best feature. It harkens back to a day when boatloads of code were written in assembly language, and Intel committed to carrying x86 assembly language closer, step by step and based on input from coders, to a high-level language implemented in microcode. With process shrink and changes in manufacturing, there is plenty of room on the die for meaningful innovations beyond core and cache.
Some, but not all of Nehalem's best features will lay latent until applications and OSes are tuned to use them. Intel wisely implemented its first on-chip memory controller to use NUMA (non-uniform memory access), which dedicates a bank of RAM to each socket. Thanks to AMD, NUMA optimizations are already part of Windows and Linux. These OSes will need updates to sense Nehalem chip IDs and flip NUMA on. Interestingly, BIOS ROMs in Intel servers introduced in the past year or so already have a Hyper-Threading switch. For OEMs smart enough to take that step, HT will be another Nehalem feature that will benefit users right out of the box.
I am jazzed about Nehalem, but a little reserved because there is often a gap between Intel's telegraphed specifications and the actual implementation. Maybe Nehalem will be different. If it is realized in the form in which it was pitched, IT should send up a collective huzzah for Intel's engineers.