Calvin Qi, who working at a search engine called Glean, would love to use the latest artificial intelligence algorithms to improve his company’s products.
Glean provides tools for searching applications like Gmail, Slack, and Salesforce. Qi says new AI language analysis techniques would help Glean’s customers find the right file or conversation much faster.
But training such a groundbreaking AI algorithm costs several million dollars. So Glean uses less, less skilled AI models that can not extract as much meaning from text.
“It’s difficult for smaller sites with smaller budgets to get the same level of performance” as companies like Google or Amazon, Qi says. The most powerful AI models are “excluded,” he says.
AI has spawned exciting breakthroughs in the last decade – programs that can turn people into complex games, steer cars through city streets under certain conditions, respond to spoken commands, and write coherent text based on a short prompt. In particular, writing builds on the latest advances in the ability of computers to analyze and manipulate languages.
These advances are largely the result of feeding the algorithms more text as examples to learn from and giving them more chips to digest it. And it costs money.
Consider OpenAI’s language model GPT-3, a large, mathematically simulated neural network fed with pieces of text scraped from the web. GPT-3 can find statistical patterns that, with striking context, predict which words should follow others. Out of the box, the GPT-3 is significantly better than previous AI models for tasks such as answering questions, summarizing text, and correcting grammatical errors. With a goal, it is 1,000 times more capable than its predecessor, the GPT-2. But GPT-3 training is estimated to cost nearly $ 5 million.
“If GPT-3 were available and cheap, it would totally overload our search engine,” Qi says. “It would be really, really powerful.”
The rising cost of training advanced AI is also a problem for established companies looking to build their AI capabilities.
Dan McCreary leads a team within a division of Optum, a healthcare IT firm that uses language models to analyze transcripts of calls to identify patients at higher risk or recommend referrals. He says that even training a language model that is a thousandth the size of GPT-3 can quickly eat up the team’s budget. Models need to be trained for specific tasks and can cost more than $ 50,000, paid to cloud computing companies to rent their computers and programs.
McCreary says cloud computing providers have no reason to cut costs. “We can not trust cloud providers to work on lowering the cost of building our AI models,” he says. He is considering buying specialized chips designed to speed up AI training.
Part of why AI has evolved so rapidly recently is because many academic labs and startups were able to download and use the latest ideas and techniques. Algorithms that e.g. Produced breakthroughs in image processing, came from academic laboratories and was developed using off-the-shelf hardware and openly shared datasets.
Over time, however, it has become increasingly clear that advances in AI are linked to an exponential increase in the underlying computing power.
Of course, large companies have always had advantages in terms of budget, scope and reach. And large amounts of computing power are table items in industries like the discovery of drugs.
Now some are pushing to scale things up further. Microsoft said this week that it had built with Nvidia a language model more than twice the size of the GPT-3. Researchers in China say they have built a language model four times larger than that.
“The cost of training AI is definitely rising,” said David Kanter, CEO of MLCommons, an organization that tracks the performance of chips designed for AI. The idea that larger models can unlock valuable new opportunities can be seen in many areas of the tech industry, he says. That may explain why Tesla designs its own chips just to train AI models for autonomous driving.
Some worry that the rising cost of printing on the latest and greatest technology could slow the pace of innovation by reserving it for the biggest companies and those who lease their tools.
“I think it reduces innovation,” says Chris Manning, a Stanford professor who specializes in AI and languages. “When we only have a handful of places where people can play with the inside of these models of that scale, it should massively reduce the amount of creative exploration that happens.”