As the demand for artificial intelligence (AI) technology continues to rise, Nvidia has firmly established itself as the dominant player in the AI chip market with its specialized graphics processing units (GPUs), the Associated Press reports.
These GPUs have been pivotal in the development of powerful AI systems, including chatbots, through a process known as “training.” However, the same strengths that make GPUs excellent for training AI models make them less efficient for a crucial aspect of AI: inference.
Inference refers to the process by which AI models apply what they’ve learned to new data to produce results—such as generating text or images. While GPUs can handle inference tasks, their capabilities are optimized for the heavier lifting required during training, which makes them less efficient for the lighter computational needs of inference. This gap has created an opportunity for rivals to Nvidia to develop specialized AI inference chips that are better suited to the ongoing operation of AI tools and more cost-effective to use.
Companies like Cerebras, Groq, and d-Matrix, as well as traditional chipmakers such as AMD and Intel, are entering the AI inference chip market, aiming to challenge Nvidia’s dominance. These companies see a growing need for more efficient chips as the adoption of AI expands and the demand for inference processing increases.
“The broader the adoption of these models, the more compute will be needed for inference and the more demand there will be for inference chips,” noted Jacob Feldgoise, an analyst at Georgetown University’s Center for Security and Emerging Technology.
AI inference, which occurs once a model has been trained, involves applying the model to real-world data, such as when users interact with an AI chatbot. While this task is lighter than training, it still requires specialized hardware to be done effectively and efficiently.
Forrester analyst Alvin Nguyen explains that GPUs are overkill for inference.
“With training, you’re doing a lot heavier, a lot more work. With inferencing, that’s a lighter weight,” he says.
This is where companies focused on inference chips are stepping in, offering alternatives that are better suited for the task and potentially at a lower cost.
D-Matrix, a startup founded in 2019, is among those launching new AI inference products. The company’s CEO, Sid Sheth, explains the difference between training and inference with an analogy:
“We spent the first 20 years of our lives going to school, educating ourselves. That’s training. And then the next 40 years of your life, you go out there and apply that knowledge—and then you get rewarded for being efficient.”
D-Matrix’s first product, the Corsair chip, is designed to optimize inference processing. The chip features a multi-chip design built to ensure efficient cooling and performance. While the company is relatively new to the market, Sheth sees significant potential in the AI inference sector as businesses seek ways to deploy generative AI tools without incurring the high costs associated with training-focused GPUs.
Unlike the larger tech giants such as Amazon, Google, Meta, and Microsoft, who have been rapidly securing GPUs for their AI research and development, the makers of inference chips are aiming to serve a broader range of customers. According to Nguyen, this includes Fortune 500 companies that want to integrate AI into their operations without having to build and maintain their own complex infrastructure. For businesses seeking to generate AI-driven content, such as videos or text, inference chips could provide a more affordable solution than purchasing high-end GPUs.
One of the key advantages of AI inference chips is that they can help reduce the substantial costs of running AI tools. These chips are also expected to lower the environmental and energy costs associated with AI operations. Sheth, who is concerned about the sustainability of AI development, emphasizes the need for more efficient hardware.
“Are we going to burn the planet down in our quest for what people call AGI—human-like intelligence?” he asks.
Sheth pointed out that the large-scale AI models pursued by tech giants require immense computational power and energy.
While the timeline for achieving artificial general intelligence (AGI) is still uncertain, Sheth argues that the need for AI inference chips will only grow as AI applications expand. He believes that many companies will not require massive, energy-intensive AI models and will instead seek more efficient solutions for running AI tasks, which could make inference chips a larger market opportunity than training chips in the long term.