Andrej Karpathy, a founding member of OpenAI, an expert in deep learning and autonomous driving, and former head of AI at Tesla, has emerged as a leading voice on AI in recent years.
Today, he posted a tweet saying, “As a programmer, I’ve never felt more behind than I do right now. This profession is being drastically reshaped… A magnitude 9 earthquake is shaking the entire industry. Roll up your sleeves and get to work—don’t get left behind.”
This tweet probably perfectly captures the collective sentiment of programmers in 2025: anxiety, excitement, and the suffocating fear of being left behind if you don’t keep learning. Over the past year, new terms have popped up one after another—agents, sub-agents, Model Context Protocol, workflows, IDE integration, etc. Companies are laying off employees to chase AI-driven productivity, while at the same time constantly grappling with the real costs of hallucinations, process breakdowns, and permission control failures.
Karpathy said, “It’s like someone handed everyone a powerful alien tool, except there’s no instruction manual. Everyone has to figure out how to understand and operate it on their own.”
AI does seem impressively capable. 2025 was even hailed as the “Year of the Autonomous Agent.” So, how far are we really from the “automation of everything”?
The New Yorker just published an article by Cal Newport that offers a sober, even somewhat deflating answer. Newport’s core argument is straightforward: 2025 did not see the explosion of general-purpose AI agents. The industry has “overpromised and underdelivered.”
He points out that agents have shined in programming precisely because the terminal is a natural text-based world, perfectly suited for large language models. But once they step outside the terminal and into the real-world workflows that require mouse clicks and web interactions, they become slow, prone to getting stuck, and errors get amplified in multi-step tasks.
Famous AI critic Gary Marcus put it bluntly: “They’re piling one clunky tool on top of another.” And in a previous interview, Andrej Karpathy stated flatly that agents “just aren’t working.”
The article does not dismiss AI; instead, it tempers the hype with engineering realities and cognitive limitations: either rebuild internet protocols to be more “robot-friendly,” or address the models’ shortcomings in temporal, spatial, and common-sense reasoning. By the end of the piece, you might agree with Karpathy’s assessment: rather than the “Year of the Agent,” it’s more like the “Decade of the Agent.”
Peak Report: Stablecoins—What Kind of Wealth Transfer Are They Really Driving?
This was supposed to be the year when autonomous agents took over everyday tasks. The tech industry overpromised and underdelivered.
By Cal Newport
December 27, 2025
Why Hasn’t Artificial Intelligence Changed Our Lives in 2025?
This was supposed to be the year when autonomous agents took over everyday chores. The tech industry made bold promises but delivered far less than it pledged.
A year ago, Sam Altman, CEO of OpenAI, made a daring prediction: “We believe that in 2025, we may see the first AI agents ‘join the workforce’ and materially transform how companies operate.”
Weeks later, the company’s Chief Product Officer, Kevin Weil, told attendees at the World Economic Forum in Davos in January: “I think 2025 is the year we move from ChatGPT being just this super smart thing… to ChatGPT being able to do things in the real world on your behalf.”
He gave examples like AI filling out online forms and booking restaurant reservations. He later added, “We will absolutely get there—no question.” (OpenAI has a corporate partnership with Condé Nast, the publisher of The New Yorker.)
This was no trivial boast. Chatbots can directly respond to text prompts—answering questions, for instance, or drafting the first version of an email. But agents, in theory, can navigate the digital world on their own, handling multi-step tasks that require using other software, such as web browsers.
Think about all the steps involved in booking a hotel: selecting the right dates; filtering options based on personal preferences; reading reviews; searching and comparing prices and amenities across different websites. In concept, an agent could automate all these activities. The impact of such technology would be enormous.
Chatbots are convenient tools for human workers; efficient AI agents, by contrast, could directly replace them. Marc Benioff, CEO of Salesforce, has claimed that half of his company’s work is done by AI, and he predicts that agents will help ignite a “digital labor revolution” worth trillions of dollars.
Venture capitalists and tech founders are scrambling to catch up—their value in the AI era is skyrocketing.
Part of the reason 2025 was dubbed the “Year of the AI Agent” is that by the end of 2024, these tools had become undeniably proficient at computer programming. In a May demo of OpenAI’s Codex agent, a user asked the tool to modify his personal website: “Add another tab next to investment/tools called ‘food I like’. Put—tacos—in the page.” The user wrote.
The chatbot quickly executed a series of interconnected actions: it first checked the files in the website directory; then examined the contents of a promising-looking file; and used a search command to find where the new line of code should be inserted. After figuring out the website’s structure, the agent used this information to successfully add a page dedicated to tacos.
As a computer scientist, I have to admit that the way Codex handled this task was more or less how I would have done it. Silicon Valley was convinced: other, more difficult tasks would soon be conquered.
Yet, as 2025 draws to a close, the era of general-purpose AI agents has not arrived. This fall, Andrej Karpathy, co-founder of OpenAI, left the company to launch an AI education project. He described agents as “cognitively lacking” and said, “They just aren’t working.”
Gary Marcus, a long-time critic of tech industry hype, wrote recently on his Substack: “So far, AI agents have mostly been a dud.” This gap between prediction and reality matters a great deal.
Chatbots that can hold fluent conversations and video generators that can warp reality are certainly impressive, but on their own, they cannot usher in a world where machines take over many of our activities. If major AI companies fail to deliver broadly useful agents, they may not be able to keep their promises of an “AI-driven future.”
In the AI Age, What Once Was Simple Has Become the Biggest Problem
The term “AI agent” conjures images of cutting-edge new technology straight out of The Matrix or Mission: Impossible – Dead Reckoning Part One. In reality, agents are not custom-built digital brains; instead, they are powered by the same large language models that drive chatbots.
When you ask an agent to perform a household-style task, a control program—a straightforward application that coordinates the agent’s actions—translates your request into a prompt for the large language model: what I want to accomplish, what tools are available, what should my first step be? The control program then attempts the action suggested by the language model, feeds the result back to it, and asks: what should I do next? This cycle repeats until the language model deems the task complete.
It turns out that this architecture is particularly good at automating software development. Most actions required to create or modify a computer program can be done by entering a small set of limited commands in a text terminal. These commands allow the computer to navigate the file system, add or update text in source files, and compile human-readable code into machine-readable bits when needed.
This is an ideal environment for large language models. “The terminal interface is text-based, and language models are built for exactly that domain,” Alex Shaw, co-creator of Terminal-Bench—a popular tool for evaluating coding agents—told me.
A more general-purpose assistant like the one Altman envisioned, however, requires agents to step outside the comfortable confines of the terminal. Because most people complete computer tasks by clicking and pointing, an AI that can “join the workforce” will likely have to learn to use a mouse—a surprisingly difficult goal to achieve.
Why Are Smart People Fleeing Social Media in Droves?
The New York Times recently reported on a wave of new startups building “shadow sites”—copies of popular web pages like United Airlines and Gmail—that allow AI to analyze how humans use cursors on these replicated interfaces. In July, OpenAI released ChatGPT Agent, an early version of a robot that can complete tasks using a web browser.
But one review noted: “Even simple actions like clicking, selecting elements, and searching can take the agent seconds—or even minutes.” On one occasion, the tool got stuck for nearly a quarter of an hour trying to select a price from a dropdown menu on a real estate website.
There is another path to improving agent capabilities: making existing tools easier for AI to use. An open-source effort aims to develop what is called the Model Context Protocol, a standardized interface that allows agents to access software via text requests.
Another is Google’s Agent2Agent protocol, launched last spring, which envisions a world where agents interact directly with one another. If my personal AI could instead send a request to a specialized AI—perhaps trained by the hotel company itself—to handle my booking on the hotel website, it wouldn’t have to navigate the site manually.
Of course, rebuilding internet infrastructure around robots will take time. (For years, developers have actively tried to prevent robots from crawling all over their websites.) And even if technologists can pull off this project, or successfully teach agents to master the mouse, they will face another challenge: the inherent weaknesses of the large language models that serve as the decision-making foundation for agents.
The Game Has Truly Changed—When the “Big Short” Traders Shut Down Their Funds and Launch Paid Groups
In a video announcing the launch of ChatGPT Agent, Altman and a team of OpenAI engineers demonstrated several of its features. At one point, it generated a map purporting to show an itinerary for visiting thirty Major League Baseball stadiums across North America. Strangely enough, the route included a stop in the middle of the Gulf of Mexico.
You could dismiss this glitch as an isolated incident, but for Silicon Valley critic Marcus, errors like this highlight a more fundamental problem. He told me that large models lack a sufficient understanding of “how things work in the world,” making them unable to reliably handle open-ended tasks. Even in relatively straightforward scenarios, like planning a trip, he said, “you still have to reason about time, you still have to reason about location”—basic human capabilities that language models struggle with. “They’re piling one clunky tool on top of another,” he said.
Other commentators warn that agents amplify errors. Chatbot users quickly discover that large models have a tendency to fabricate information; a well-known benchmark test shows that different versions of OpenAI’s cutting-edge GPT-5 model have a hallucination rate of around 10%.
For agents tasked with multi-step tasks, this semi-regular tendency to veer off course can be catastrophic: a single misstep can derail the entire operation. A headline in Business Insider warned this spring: “Don’t Get Too Excited About AI Agents. They Make a Lot of Mistakes.”
It’s 2026—Why Are the World’s Savviest Money-Makers Obsessed with Launching Subscriptions?
To better understand how a large language model’s “brain” might go astray, I asked ChatGPT to outline the plan it would follow if it were powering a hotel booking agent. It described a sequence of 18 steps and sub-steps: selecting a booking website; applying filters to search results; entering credit card information; sending me a booking summary; and so on.
I was impressed by the level of detail with which the model broke down the activity. (It’s easy to underestimate how many tiny actions go into a common task like this until you see them listed one by one.) But I could also spot potential pitfalls where our hypothetical agent might go off track.
Take sub-step 4.4, for example, which instructs the agent to rank rooms using a formula: α*(location score) + β*(rating score) ? γ*(price penalty) + δ*(loyalty bonus). In this case, the general approach is correct, but the model’s specifications for the details are worryingly vague.
How would it calculate these penalty and bonus values? And how would it choose the weights (represented by Greek letters) to balance them? A human would probably tweak these parameters manually through trial and error and common sense, but who knows what the large language model would do on its own. And small mistakes can be costly: if it overemphasizes the “price penalty,” you might end up staying in one of the worst hotels in town.
A few weeks ago, Altman announced in an internal memo that developing AI agents is just one of OpenAI’s many projects, and the company will de-emphasize this direction as it shifts focus to improving its core chatbot products. Just a year ago, leaders like Altman were talking as if we had already gone over the technological cliff, tumbling chaotically toward an army of automated laborers.
Now, that breathless enthusiasm seems premature. Recently, to calibrate my expectations for artificial intelligence, I’ve been thinking back to a podcast interview from October with OpenAI co-founder Andrej Karpathy. The interviewer, Dwarkesh Patel, asked him why the “Year of the Agent” failed to materialize.
“I think there was some overprediction in the industry,” Karpathy replied. “To me, a more accurate way to put it is: this is the ‘Decade of the Agent.’”
|