On the Progress of Artificial Intelligence
Scaling laws, algorithmic improvements, and test-time compute — why AI progress is hyper-exponential and why most people outside of AI research are too conservative about timelines.
To me, it feels as though one of the fundamental things that those outside the AI research community seem to not quite understand is the rate at which artificial intelligence is improving. Yes, people at a high level get that AI progress is rapidly accelerating, but they lack a visceral intuition as to why this is the case, and subsequently how quick that progress just is. My goal with this article is to try to convey this pace of acceleration to non-technical individuals. Hopefully, armed with this knowledge, the reader can better understand why organizations of all sizes, from companies to governments, are taking the actions that they are, as well as better grasp the implications of these actions.
Within the span of a decade, we have seen AI go from a niche topic of computer science research, to something that’s on the news everyday. At the heart of this AI boom are the “discovery” of certain trends known as the scaling laws. These, very broadly, are as straightforward as they sound. The more compute (data, model size, etc.), the better your model. Now this isn’t an empirical law of the universe, it’s more a consistent observation that has been supported by mountains of evidence.
Note that specifically, you need to scale compute exponentially to see linear gains in performance. Basically, that means if you want to double the “smartness” of your model, you can’t just double compute, you’ll have to 10x it. But, because of Moore’s Law, this isn’t much of an issue. For the unfamiliar, Moore’s law says that the number of transistors that can fit on a computer chip grows exponentially over time (i.e. the power of computers grows exponentially). So as the amount of transistors we can fit onto a chip grows, the more compute we’re able to throw at these models. And the more compute we throw at these models, the better they become. Which is to say progress from scaling will continue well into the future.
What’s important to take away from all that is that AI models get better as you make them bigger.
Now this is where investments tie in. The amount of compute we currently use to train AI is nowhere near the amount of compute we could be using. GPT-4, the base model for ChatGPT, is estimated to have cost around 500 billion into AI infrastructure over the next 4 years. That’s akin to a 5000x increase in compute. This will directly lead to smarter models. Significantly smarter. And looking at the AI arms race that is beginning to form between the United States and China, it’s also clear investments will likely only increase over time, not decrease.
But what is critical to understand is that this idea of scaling is just one of many factors contributing to AI progress. On top of increased compute, AI can be made more efficient by altering the underlying algorithms that make these models work. An obvious example of this might be the clever researchers who made DeepSeek R1. It has been estimated that the level of compute needed to achieve a given level of performance halves roughly every 8 months due to these changes in architecture. That is to say, algorithmic improvements cause a ~2.8x increase in performance year on year. This, which is independent of scaling, is an extremely fast rate of improvement. For reference, Moore’s Law says transistor density grows at a rate of 1.6x a year. Mind you, Moore’s Law is why we have computers in our pockets millions of times more powerful than the computers that put men on the moon. So with algorithmic improvements alone, we’re also seeing massive gains.
Even further, the AI research community is experiencing a paradigm shift regarding the way we build these models. The past few years of AI have been focused on pre-training. Pre-training is the process where you give a model a bunch of example data and have it learn from those examples. From there, you can deploy your trained model to do whatever you designed it to do. Pre-training is still an important step, but now the research community is beginning to embrace the concept of test time compute. This is where you try to let the model think and learn after it’s been deployed. For those who use the free tier of ChatGPT, think about how it just blurts out an answer to your prompt right away. There’s no thinking or learning at test time. Compare this with the paid tier of ChatGPT or DeepSeek R1. These newer models think about their answers before actually giving a response. At a high level it makes sense that this would lead to performance gains. When you’re dealing with a difficult problem you don’t just yell out an answer. You sit and think about it first.
Currently, the notion of test time compute is still in its early stages. The largest test-time compute costs are on the order of hundreds or thousands of dollars at most. Now, $1000 for a single question is certainly a lot. But what’s important to realize is that this test-time compute follows the same scaling laws as pre-training. It’s an entirely new dimension along which we can throw resources at. The entirety of AI progress up until now — going from generating incoherent sentences to PhD level intelligence — has been through scaling up pre-training. So not only are we continuing to throw more compute at pretraining, we’ve now unlocked this completely new, completely untapped way to scale.
And if all this wasn’t already enough, AI researchers have already begun using AI themselves to make better models. Whether that is generating synthetic data or leveraging these models when working on the underlying math, more and more of the research and development process is being automated. At the current pace, it likely won’t be long until we see self improving systems, where humans aren’t even in the loop. This, of course, may mean that a 2.8x increase in algorithmic improvement year on year will likely be a conservative figure going into the future.
It is critical that the reader understands the compounding effect of all these factors. The scaling laws, increased investments, algorithmic improvements, test time compute, and using AI to make better AI are all interconnected. In that way, AI is unlike other technologies. Growth isn’t just exponential, but “hyper-exponential.” That is ultimately why those in the AI community talk about the obscene levels of progress that will only get more obscene, and why investments in the AI sector need to be taken so seriously. We’re still in the early stages of this progress, so it’s understandable to look at the state of everything and see AI as any other technological advancement. But I’d argue this hyper-exponential improvement makes this fundamentally different.
It’ll be interesting to see how the societal diffusion of AI compares to that of other technologies throughout history like electricity or the internet. There are admittedly many smart people who think the AI community is over-anticipating this rate of diffusion. They’re probably right. AI researchers might be experts on this technology, but they’re not experts on economics, or geopolitics, or the million other factors that one has to consider. At the same time, all the signs point to this all happening very quickly. So while “curing all disease” or “the end of a labor-based economy” might not happen this year or in the coming few years, I still feel people’s timelines are generally way too conservative. It might not be months, but it certainly won’t be decades.