Anyone familiar with Nvidia’s (NVDA) strategy knows that the company is all-in on AI, both in terms of software and hardware. AI was the central technology around which Nvidia’s demand for a $1 trillion market opportunity was based on its recent Investor Day. In the future, Nvidia envisions that all companies will build AI data centers.
However, Nvidia is not the only company seeking this market opportunity. After many years, Intel (NASDAQ: INTC) finally rolls out its AI in 2022. This could be detrimental to Nvidia’s market share, and conversely opens up a very lucrative new market for Intel. In particular, Intel just launched its new Habana coin that blows Nvidia’s current-gen A100 out of the water. This means that Nvidia’s leadership in AI training is finally in serious question.
Intel has a multi-pronged AI strategy. Basically, Intel’s strategy has been to infuse all of its silicon with AI capabilities. So when you hear a company like AMD (AMD) making fanfare for putting Xilinx AI into its silicon, Intel has actually been pursuing a similar strategy for years already. Specifically:
- In processors, Intel used its AVX-512 leadership and combined it with specific 8-bit and 16-bit AI instructions. In the upcoming Sapphire Rapids Xeon, Intel will finally put a full-fledged AI accelerator called AMX that makes Nvidia’s midrange AI accelerators obsolete. (Intel claims 30x performance over Ice Lake, though I estimate the theoretical TOPS improvement at around 10x.)
- In FPGAs, Intel launched a few years ago an FPGA with the equivalent of Nvidia’s Tensor Cores, although this part is still 14nm.
- In GPUs, Intel’s upcoming Ponte Vecchio should be roughly on par with Nvidia’s upcoming Hopper.
- Besides this “traditional” silicon, Intel also has dedicated Habana NPUs (neural processing units) for training and inference.
To be clear from the start, I was very critical of Intel Habana’s strategy and execution. Especially given that Intel will soon have a similar competitor to Nvidia in the form of its Ponte Vecchio GPU, the need for the Habana Accelerator isn’t entirely clear. This is especially true as the Gaudi part that launched on AWS last year was on an outdated 16nm process.
However, what the Habana Gaudi lacked in performance, it made up for in price. Intel/Habana claimed it offered a 40% better price/performance ratio over the A100. Habana was able to achieve this in four ways.
First, by being a dedicated NPU, Gaudi lacks all GPU functionality. This allowed for deeper learning hardware, which made Gaudi faster than Nvidia’s 16nm V100 part, and therefore reduced the performance delta compared to the 7nm A100. Second, not only is Gaudi1 faster than the V100, but it also has a smaller silicon area, which makes it cheaper to manufacture. Third, 16nm wafers are inherently cheaper than 7nm wafers, further widening the cost gap. Finally, Nvidia is making the most of its monopoly position in AI training by charging ultra-high margin prices. By giving up some gross margin, Habana was finally able to deliver a 16nm part that has favorable performance per dollar compared to Nvidia’s offerings.
Nevertheless, the obvious next step was to go to 7nm, and that is what Habana has just launched with Gaudi2. What’s notable here is that Gaudi2 is launching approximately six months after Gaudi1 became available in AWS, or 1.5 years after Gaudi1 was originally announced. So while Habana still lags behind in process technology, as the A100 was launched about two years ago already, this rapid pace restores some confidence in Habana’s execution.
Gaudi2 triples the number of cores compared to Gaudi. This allows Habana to claim a 2x performance advantage over the A100, as measured in a number of benchmarks (one pictured below). In other words, Habana currently has undeniable leadership in AI performance (ignoring Cerebras’ Wafer-Scale engine). BesidesGaudi2 also tops the A100’s 80GB memory, and like the first, Gaudi still relies on the open Ethernet interconnect instead of Nvidia’s proprietary NVLink:
Gaudi2 triples the enclosure’s built-in memory capacity from 32 GB to 96 GB of HBM2E at 2.45 TB/sec bandwidth, and integrates 24 x 100 GbE RoCE RDMA network cards, on-chip, for scaling and scaling using standard Ethernet.
While people may notice that Nvidia will launch its H100 Hopper GPU in Q3, making Habana’s leadership only short-lived (Nvidia claimed 3x base performance over the A100), Gaudi2 will remain a compelling alternative. Especially since some of the advantages mentioned above (7nm wafers being cheaper than 5nm wafers and Habana not trying to obtain the same gross margins as Nvidia) remain valid.
Perhaps as a caveat, Gaudi2 has a significantly higher TDP than the A100, so in terms of performance per watt the difference is smaller.
Ultimately, what’s most optimistic about Gaudi2 is that it provides a 2x performance advantage over the A100 in the same process node. This suggests that Habana simply has a superior architecture that lacks all the heft inherited from the A100 being a GPU with AI capabilities. So when Habana finally achieves process node time-to-market parity, it will likely have clear leadership. (If Gaudi2 had launched in 2020, it would have had 2x leadership for 2 years instead of 2 months.)
Potential impact on inventory
A successful foray into AI could not only fuel Intel’s growth, but it could also serve as a key indicator of Intel’s technology leadership. If investors recognize this (with Intel working out some of its other issues over the next few years), then perhaps the stock market will reward Intel with a higher multiple, similar to Nvidia and others.
As a reminder, this was one of the two pillars of Pat Gelsinger’s “double double” strategy for the stock price: double the earnings at double the multiple.
Takeaway for investors
The launch of Gaudi2 is above all a moral victory. On the same process node, Habana claims it provides a 2x performance advantage over the A100. However, with Nvidia moving to the H100 next quarter, Habana’s leadership will admittedly be short-lived. Nevertheless, Gaudi is the first line of chips from many startups that have emerged that seriously challenges Nvidia. The other encouraging sign is that it’s launching only six months after Gaudi became available from AWS (albeit with a six-month delay), suggesting that Habana is steadily reducing its process disadvantage. While AWS hasn’t announced anything yet about plans to introduce Gaudi2, it should allow Habana to maintain its existing performance-per-dollar advantage in the cloud.
On a higher level, 2022 marks an important year for Intel’s AI strategy, following many years of development. After Gaudi2, Intel will further release its Ponte Vecchio GPU (which should have similar performance to the H100) as well as the Sapphire Rapids Xeon processor with AMX matrix acceleration instructions – these are like Tensor cores inside each processor, eliminating the need for a separate processor. GPUs. In the second half, Intel will also launch Sapphire Rapids with HBM.
In summary, Nvidia is far from the only competitor in AI, whose market share leadership is primarily in the training part of AI model building, not inference where the Xeons are (MORE ) widely deployed for years already. By the end of the year, Intel will have not one but three leading products to compete in this space. Given Nvidia’s premium pricing and margins, the current market share status quo seems untenable.