The State of AI in 2025
Simon Willison has a great post on everything we learned about AI in 2024 (somewhat technical). Inspired by him, here is a roundup of the top events of 2024 in AI and where we are now.
1. Everybody caught up to OpenAI.
-
The top tier is OpenAI, Google, and Anthropic. They are very close, for most purposes the difference is insignificant, for some use cases one is better than the other.
-
With Gemini 2.0, Google seems to be the current leader by a modest amount on overall base model quality. Gemini is better enough, and cheaper enough that I would switch to it for some jobs.
-
Flash is very good, very fast, and very cheap
-
Long context
-
Great native multimodality (although OpenAI has caught up on vision/audio/video)
-
Deep research, thinking models, NotebookLM state-of-the-art text-to-speech.
-
-
OpenAI is still ahead on product, things like structured outputs, o1, overall ecosystem. First-mover advantage is real, ChatGPT is synonymous with advanced AI in the public’s mind for now. Google now supports the OpenAI API semantics, which attests to OpenAI being the leader and Google the fast follower.
-
Claude is very strong on technical discussions and coding. Definitely try it for deep conversations about code or research papers. They pioneered the Canvas, are doing interesting stuff on API discovery. (Punching below their weight in Chatbot Arena)
-
Use Chatbot Arena and artificialanalysis.ai to compare models for different use cases.
2. The rise of small and open-source models nipping at the leaders’ heels.
-
The thing that shocked me most in 2025 was Gemma2-9b-it-SIMPO. Gemma is Google’s excellent open source model. A group at Princeton fine-tuned it to score higher than early GPT-4, and shrank it to run adequately fast on a MacBook. Bananas.
-
There are a slew of strong dark horses: Llama, Grok, Nova in the US; Qwen, Yi, DeepSeek from China, Mistral in Europe.
-
The gap between the frontier models and the flash/mini or smaller open-source models shrank.
3. The scaling laws hit the wall.
-
Gemini 2.0 is a significant improvement over 1.0 but not as big as GPT 3.5 to GPT-4
-
OpenAI failed to deliver a foundation model that was a worthy successor to GPT-4 despite 2 gigantic training runs, according to reports. Anthropic also failed to update Opus, their largest model, leaving Sonnet as their flagship model.
-
These top 3 bullet points reflect the commoditization of AI, no one has a monopoly, the price of pretty good AI has dropped by several orders of magnitude.
4. OpenAI pivoted to inference-time compute with o1 and o3.
-
Faced with GPT-5 hitting the wall, OpenAI pivoted to models like o1 that think, run multiple LLM calls, use search algos to look for the best way to answer a question via a multi-step process.
-
Probably the 2nd most shocking thing I saw in 2024 was this thread by math professor and Fields medalist Terence Tao, saying that o1 can do math at the level of a strong math major. I thought claims about it solving problems from the Putnam test (the leading US undergrad math competition) were probably overblown or training data contamination, but even if you fuzz them it can still solve a few of them at least. Bananas.
-
o1 is very good but very expensive.
-
o3 was announced, no one has seen it but OpenAI has hyped it for release in about a month, saying it is a superstar programmer and beat an AGI test. There is so much hype and so much pressure to beat Google that I am skeptical. I expect a notable improvement over o1. But with Google already releasing ‘thinking’ models, it’s not clear how big a lead OpenAI has. I don’t quite know what to make of the AGI claims, either it’s a bit overhyped or they found something important. To date LLMs are great with language but understanding is skin-deep, they don’t model complex abstractions deeply. There is so much pressure to be best and first that capabilities are overhyped and corners are cut. But I have been surprised before.
-
Interesting consequences in the chip world. Possibly tired: 100,000-GPU Nvidia training clusters. Wired: Fast inference chips like d-Matrix, Groq, Cerebras, and a slew of others.
5. Everything is agents now.
- To me the definition of agent is when the LLM makes control flow decisions, whether with tool choice in a directed acyclic graph decision tree, or a full-blown autonomous loop. And every AI product does that now; when you use the chatbot it looks at your query and routes it to different models and tools. Devin-style full-service software developers are still a bit pie-in-the-sky but everything short of that, from low-code automation up to superintelligent assistants, is very much in play.
7. The Advanced Voice Mode with vision and other crazy sci-fi stuff.
- Point your phone at anything and have a conversation about it. Fast conversational multimodal FTW. There was a huge leap in video generation with Sora, Kling, Runway. Also associated tech like NotebookLM, Suno. We are not that far from a fully synthetic, realistic virtual or robotic avatar to interact with. (Apols for the NSFWish projection but I think OnlyFans creators might be the first victims of robots, before even truck drivers)
8. Self-driving cars achieved product-market fit, with one player left standing: Google’s Waymo.
-
And possibly a European and Chinese competitor. Tesla is not really in the game yet, no robotaxis on the road yet even with safety drivers, maybe that will change and maybe it won’t. Waymo has a big lead.
-
IMHO we will see this type of highly autonomous robot, static or on wheels, for 5 or maybe 10 years before we see a lot of humanoid robots. And ofc fully virtual robots as discussed above. Power, cost issues, normie workplaces can’t keep copiers and printers working.
-
As an aside, the meme of AI killing Google is overblown, more likely Google integrates AI in search and laps the competition, cures cancer with AI and leads the AI revolution, even as search declines a bit in favor of chatbots. As long as the thing we interact with is a text box, Google should do OK. When AI becomes ubiquitous and phones and homes and smart glasses are environments we talk to that watch us and interact with us, that’s a bigger bridge to cross.
9. Safety took a back seat, and politics took center stage.
- US-China trade war, EU regs, California regs, major competition authority actions against all the Big Techs. With Musk in charge of tech policy in US, safe AI is so 2023, takes a back seat to accelerationism. But brace for the backlash, possibly the Butlerian Jihad (via Christopher Mims). Everything is mediated through big tech, and people view AI with suspicion as IP theft, surveillance state, manipulation and dark patterns. People look for scapegoats.
9. Dead internet and AI slop everywhere.
- Stack Overflow is in freefall. Most questions AI answers better, just because it reads the documentation carefully. Same for big social media like X and Facebook. Anything big enough gets overrun by slop and adverse selection. Back to artisanal social media like Discords, and painstakingly building your own small social and professional circle.
10. 2025 and beyond?
-
Smart glasses, pervasive AI
-
Social issues around IP theft, surveillance, deepfake attacks. We need a new social contract around AI and it’s hard.
-
Hyper-personalization.
-
Medical advances with AI: diagnostics, clinical decision support, AI drug discovery, surgical robots, wearables, personalized medicine, administrative productivity AI.
-
Eventually a financial crash because markets are insane even as AI powers everything.
-
I think AGI and super-intelligence are actually pretty close. Take what we got and add better knowledge representation and world models, the ability to learn on the fly instead of moonshot training runs, and you get something like AGI. On a scale of a few years or decades, with the scale of brainpower and compute currently deployed, surely that is imminent. On a lot of benchmarks and tasks like say detecting tuberculosis on a scan, the AI is much better than most professionals on a tight time budget like 15 seconds. And the best professionals will be much better than the AI on a higher time budget like 5 minutes. So there is some time crossover where the humans start to beat AI. And over time probably the AI will beat more humans at more jobs at any given time limit, and the crossover where humans outperform the AI will shift to higher time limits. AI assistants will infiltrate more and more knowledge work. Use AI for what it’s good at, processing lots of language, use tools for what they are good at, looking up knowledge and running predictable tasks, and use humans for judgment and critical thinking. And over time the human component will shrink.
And the social upheaval will be unmanageable unfortunately. The tech bros think they will solve all of society’s problems if you just give them a few trillion. I think everything is breaking and tech is one big reason why, it’s the cause of as many problems as it solves and these Elon Musk and Sam Bankman-Fried types are mountebanks, or buffoons, or blinded by greed, or in some cases just want to burn everything down. Mustafa Suleyman is correct that we need to spend a significant fraction of AI efforts on guardrails to avoid disasters, and preparing people and society for the impacts.
Every few years, the bandit, the revolutionary and the priest trade hats. And the tech folks have gone from revolutionaries to bandits. A guy I used to think highly of in tech, said that the one big thing in 2024 was Silicon Valley’s hostile takeover of the federal government, via an infiltration of Donald Trump’s MAGA movement, and Trump’s subsequent thrashing of Joe Biden’s “establishment” government in the November election. The thing is, if you are posting groyper memes, changing your name to Kekius Maximus, and pimping the AfD, who is infiltrating who? The word of the day is that Muslims, Mexicans, immigrants, whoever you don’t like, are not human, and democracy is for losers, and that does not end well. Never forget that Neville Chamberlain was a ‘realist’ conservative. These people are going to sell out democracy, sell out to Putin, and blow up the world, literally.
I had AI take a crack at the first version of this, Gemini 1.5 with Deep Research. Did not use AI at all after reading that output, blame me for any typos and bad writing.