A Generative AI Plateau

Computerphile (a YouTube channel I really love) released a video today titled Has Generative AI Already Peaked?. In the video, Mike Pound talks about a research paper released in April called No “Zero-Shot” Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance. I think this video and the paper resonate with how I’ve felt about the hype around generative AI in the last several years.

To summarize Mike’s thesis, he’s developing a view, which the paper corroborates, that producing a general intelligence through large training data that can solve new, and previously unseen problems (a “zero-shot”) is an exponential data problem. Therefore, without some new radical innovation, we may see a plateauing of AI performance versus ever larger data sets.

Intuitively, this makes sense and looks like yet another materialization of the Pareto principle. I’m certainly no AI expert, but I am a software engineer and have built and worked on a lot of different problems. There are some parallels I see in AI development to the engineering process that I find interesting. To illustrate, allow me to walk through my process for solving a type of problem I’ve never seen before:

When the problem is first introduced, I don’t have an intuitive sense of the bounds around the problem; I don’t know the shape of it. My first step is to “play” with the problem to get a sense of the important parameters and their normal ranges of values. To do this I may build a naive brute force solution to the problem to establish a lower performance bound.
As I explore and begin to get a handle on the problem, an intuition for optimization opportunities will broaden. An academic stage begins where I start to crystallize my mental model and refine frameworks. Through academic rigor, I will acquire real insights about the problem and this is where break out returns on that investment are made.
Finally, there’s a polishing phase where further augmentation of the solution is slow and hard. A cost/benefit analysis is needed to know that further polishing is worth the effort.

We are still in that first stage of roughing out the AI problem domain. We don’t really know how intelligence works and we’re simply throwing brute force at the problem. Generative AI is the naive and wasteful strategy that we use to start to learn the shape of the problem. The evidence shown in the paper where larger models don’t produce better performance is expected. It’s expected because we don’t understand intelligence. Generative AI only approximates it through mimicry.

There’s a lot of hype around generative AI right now (take a look at NVDA and tell me that’s not a hype bubble). Lots of people are trying to figure out where we are on the Gartner hype cycle. I think in truth there are two hype cycles at play. One for generative AI, and we are quite obviously reaching the plateau of productivity. ChatGPT is widely adopted, as is Midjourney. The technology is beginning to show diminishing returns for model size. However, I believe for the second, larger hype cycle for AI in general we’re at the peak of expectations and in the next few years we will enter the trough of disillusionment as people realize that our current best strategy doesn’t live up to the grand promises that have been made.

I have no reason to think that AI in general won’t follow the same technology trend and ultimately reach a plateau of it’s own. And perhaps the promises of a changed game where AI forever breaks our models of technological progression will come true. I don’t own NVDA, but if I did I would feel very wary of hanging on to that trade through the next few years.