- The Deep View
- Posts
- Why token panic is reshaping AI
Why token panic is reshaping AI

Welcome back. AI’s infrastructure story keeps getting bigger, and messier. CoreWeave is tackling the GPU crunch with new data center innovations built around Nvidia’s next AI supercomputers. Snap is pushing smart glasses toward much bigger ambitions with a $2,195 wearable computer that wants to replace screens, not just supplement them. And token panic is hitting enterprises hard, as seen at Databricks' Data + AI Summit. The AI usage boom has created an inference cost crisis, forcing companies to rethink model routing, local compute and the economics of agents. —Jason Hiner
1. Why token panic is reshaping AI
2. Snap's $2K AR glasses want to compete with PCs
3. Exclusive: CoreWeave's 2 fixes for AI factories
ENTERPRISE
How enterprise token panic is about to change AI
Make no mistake, enterprises are panicking about token costs.
At Databricks' Data + AI Summit, I talked to executives at various enterprise organizations and they all expressed that AI inference costs have run so far over budget in recent months that it's precipitating a crisis.
These are not naturally dramatic people, but they told me things like, "Our CFO is going to lose it when inference costs come in," and "The coding agents have run our token bills out of control," and "Last year, the executives were letting every flower bloom; now they're coming in with a lawnmower."
At the event's opening keynote, Databricks CEO Ali Ghodsi addressed the elephant in the room. "It's completely unsustainable for the organizations out there," he said.
Later, in a briefing with the press, Ghodsi said, "It's the number one thing we're getting asked: 'how do we curb the cost but still invest in ai?'"
As a result, two things are quickly becoming true:
Model selection matters: There are a lot of queries and workloads getting sent to frontier models in the cloud (ChatGPT, Claude, and Gemini) when they could easily be handled by small models, domain-specific models, and open models. Using frontier models for simple questions and tasks is like using a chainsaw to cut down a daisy.
Hybrid compute will be part of the answer: Running open models locally on your own hardware is the other way to massively save on inference costs. The challenge is that you have to calculate and shape query traffic to make sure you're taking full advantage of the capacity that you build out, rather than just paying per token.
This is bad news for OpenAI and Anthropic, who have been the biggest beneficiaries of tokenmaxxing. Anthropic has especially seen its revenues soar as enterprises invested heavily in granting unlimited token access to its employees.
Unsurprisingly, Databricks offered its solution to help enterprises wrestle the token problem to the ground: Unity AI Gateway. The platform provides visibility into how many tokens your organization is spending, before you get a giant bill. It also provides observability into what the agents are doing and, most importantly, the ability to route tasks and queries to the most appropriate model or lab.
Databricks also launched Omnigent, which it describes as a harness for harnesses, to help enterprises get better results with Claude Code, Codex, Cursor, and other agents. And it launched its own enterprise agent platform called Genie One.

The Deep View is hearing from nearly every direction right now that enterprises are becoming deeply alarmed about token spend and interference costs. It's quite a turnaround from a year ago, when CEOs were still desperately trying to get their employees to use AI. Now, the employees and their coding agents are using it so much that the economics are falling apart. The bottom line is there's a lot of work that needs to be done to make AI more efficient, to optimize workloads, and to make routing tasks and queries a lot smarter and safer. Databricks is one of the vendors that wants to step in and help with that, but the list of companies lining up to help organizations control their agents is getting very long.
TOGETHER WITH MERCURY
Turn your bank* into something that actually executes
Most founders understand their finances by exporting data to spreadsheets and external AI tools because their bank can't do the work itself.
Mercury Command is AI built into your account that actually does financial work for you. Just say "follow up on that invoice" or "what's my runway?" and it's done.
Every Command answer is grounded in your data, and every action is reviewed, approved, and fully traceable. No exports, no external tools, no navigating dashboards. Just direct command.
CONSUMER
Snap's $2K AR glasses want to compete with PCs
Smartglasses are having a moment in 2026, and Snap is seeking to take that a step further by infusing a traditional glass form factor with AR technology, in what it's calling a wearable computer.
On Tuesday at the Augmented World Expo, Snap unveiled SPECS, glasses that deliver AR capabilities while shedding the bulk typically associated with AR headsets, unlocking everyday AR use cases. Users will be able to receive contextual AI assistance, such as turn-by-turn directions, while also enjoying a large private display for streaming content or getting work done.
“SPECS are the beginning of a new era in computing,” said Evan Spiegel, co-founder and CEO
of Snap, in the announcement. “For decades, computers have asked us to look down, sit still, or step out of the moment. SPECS brings computing into the world around us where we live, work, learn, create, and connect."
One aspect of smart glasses is wearability, which is dependent on looks and comfort. Aesthetically, the glasses keep the classic black frame aesthetic, with a frame shape that is a bit more unique. SPECS will be available in two sizes: the 47 mm model, weighing 132 grams and the 52 mm model, weighing 136 grams.
It supports four hours of mixed-use battery life with no tethered puck, and the included charging case provides four additional charges, for up to 20 total hours of mixed use. Other highlights include:
Display: Liquid crystal on silicon display, with a 51-degree field of view, equivalent to a 24-inch desktop display or 115-inch home cinema screen, and 16 million colors for rich visuals
Lenses: Electrochromic lenses that can shift from clear to tinted in 10 seconds; removable inserts support a wide range of prescriptions, according to Snap
Processors: It is powered by two Snapdragon processors, one for computer vision and the other for running the lenses
Frames: Made from high-performance Swiss TR90 polymer
Privacy: LED light glows when recording, on-device data processing prioritized, and users are prompted before accessing sensitive information
A key element of SPECS is its rich developer ecosystem, and, at the conference, the company introduced new developer tools. For instance, it launched agentic development for SPECS Lenses in Lens Studio, making it easier for developers to build and improve their tools. The developer preview is rolling out in Claude Code, Codex and Cursor.
SPECS are available for pre-order today for $2,195 with a $200 refundable deposit. While the price is steep for smart glasses, it's useful to compare it to the price of a traditional AR headset rather than something like the $800 Meta Ray-Ban Display glasses. The glasses are expected to ship this fall in the United States, United Kingdom, and France.

The smart glasses category is expanding at a rapid pace, blending the real world, AI assistance, and, with Snap's AR component, a virtual world too. Snap brings expertise in AR from years of developing filters and experiences within Snapchat. However, having worn the Meta Ray-Ban Display glasses, which weigh 69 grams in the standard size and 79 grams in the large, and finding them a bit too heavy to be comfortable, the SPECS do raise some comfort concerns for me at nearly double that weight. That said, the added functionality of being able to work from them may be worth the extra weight, making them tolerable when worn during the specific moments they are most useful. I definitely intend to test them to find out, and will share what I learn.
Disclosure: Sabrina Ortiz's travel to Augmented World Expo was paid for by Snap. The Deep View's coverage is editorially independent from the companies we cover.
TOGETHER WITH CONVEX
Code Faster, Easier, And Less Buggy With Convex
You’re probably already familiar with Convex, the backend building platform that helps your AI agents excel – but their latest updates are taking things up a notch. Convex recently launched plugins that allow you to connect Claude Code, Codex, GitHub Copilot and more directly to your Convex deployment, allowing them to…
Read data and run functions entirely on their own
Optimize your project’s performance via production insights
Improve safety by limiting access to production
There’s even more updates coming in the near future, but for now, see how you can take your agents’ work to the next level with the latest from Convex right here.
HARDWARE
Exclusive: CoreWeave's 2 fixes for AI factories
AI has completely shifted how the tech industry thinks about infrastructure.
Though people usually think of the GPU crunch when they consider infrastructure bottlenecks, chips are only one cog of a much larger machine. As some of these components become more capable, the rest of the system needs to keep up. It's why CoreWeave decided to build around these problems.
"That concept has now created the need for us to think about the data center from an AI-centric perspective at a rack level, versus just at a machine level," Corey Sanders, SVP of product at CoreWeave, told The Deep View.
The Deep View sat down with CoreWeave for an exclusive interview to discuss new innovations built specifically around Nvidia's Vera Rubin NVL72, the chip giant's 72-GPU system built for AI supercomputing.
The biggest innovations lie in two areas, said Jacob Yundt, senior director of compute architecture at CoreWeave: cooling and control.
Valvey: Yundt's team developed a programmable valve system, affectionately named "Valvey," that controls the way that coolant flows through server racks, using software to monitor pressure, flow rate and leaks. The impact is that this dramatically reduces downtime caused by rack failures, isolating the incidents to prevent a domino effect of shutdowns.
Racky: The team also created a single controller, called "Racky," that sits on top of each rack to control everything in one interface, including power, cooling, and environmental sensor data. This creates a single unified system that allows customers to more easily manage their racks, giving them the ability to scale up their infrastructure more smoothly.
While these updates may sound esoteric, the downstream impacts matter, said Sanders. The goal is to make managing these systems a more seamless, unified process and to save time and costs by minimizing downtime.
"One of the things I'm excited about with Vera Rubin and our partnership with Nvidia is that it changes the shape of what's possible with our end customers," said Sanders. "They can experiment a lot more. They can take more chances. They can innovate more."
It's something that's vital as AI agents force these systems to work harder. And given that CoreWeave provides services to nine out of the 10 leading model providers, "the rack really matters for our customers and their workloads," said Sanders.
"I think the future of AI workloads will span even multiple data centers and potentially even span multiple clouds," Sanders added.

AI has created a ripple effect that's causing growing pains for practically every layer of the tech stack, from the software, to the chips and server racks, to the energy itself. However, it's impossible to ignore that several of these innovations often revolve around a central entity: Nvidia. The chip giant, for instance, was part of OpenAI's coalition for developing Multipath Reliable Connection, an open standard for networking designed to make GPU clusters faster, more reliable and more efficient. Though AI has undoubtedly created a broad motivation to innovate, the fact that Nvidia has a hand in the innovations spanning practically every layer of its self-described "five-layer cake," from open source standards to models to rack design itself, allows it to further solidify its dominance as the kingpin of the tech that sits at the foundation of the AI industry.
LINKS

SpaceX acquires Cursor in a deal worth $60 billion
ChatGPT's market share has fallen below 50% for the first time
Microsoft shifts Copilot Cowork to usage-based pricing, considers Deepseek
Mistral models ranked among worst at filtering out Russian disinformation
Qualcomm announces Snapdragon chipset for improved on-device AI
Quantum startup Atom Computing raises $100 million Series C

GLM-5.2: Z.ai's latest open weights model, with strong long-horizon capabilities with a 1 million token context window.
Origin: Cursor's competitor to Git, scalable for AI agent workloads.
Qwen-Robot Suite: Alibaba's full-stack suite for embodied AI, including three different robotics-centered models.
Manus: Users can now queue messages while tasks run.

AMD: Principal GenAI Inference Optimization Engineer
Meta: Partner Engineer, Generative AI
Intel: AI Developer Evangelist
Unconventional AI: Member of Technical Staff, Language & Reasoning Models
POLL RESULTS
Do you perceive the US as the dominant AI frontrunner?
Yes (38%)
No (28%)
I’m not sure (30%)
Other (4%)
The Deep View is written by Nat Rubio-Licht, Sabrina Ortiz, Jason Hiner, Faris Kojok and The Deep View crew. Please reply with any feedback.

Thanks for reading today’s edition of The Deep View! We’ll see you in the next one.

“The grass looked more natural.”
|
“River is too straight in [this image] and perspective is wrong.”
|


If you want to get in front of an audience of 750,000+ developers, business leaders and tech enthusiasts, get in touch with us here.
*Mercury Disclaimer: Mercury is a fintech company, not an FDIC-insured bank. Banking services provided through Choice Financial Group and Column N.A., Members FDIC.













