• The Deep View
  • Posts
  • OpenAI tops leaderboard of 'Thinking Algorithms'

OpenAI tops leaderboard of 'Thinking Algorithms'

Welcome back. Anthropic has completed its first acquisition. The AI model firm announced Tuesday it has acquired Bun, a software development tools company, for an undisclosed sum. Anthropic has been using Bun for several months, and the acquisition gives the company direct control over the runtime powering its popular Claude Code offering, which reached $1 billion in run-rate revenue just 6 months after launch. It's the first of what may be several acquisitions to come from Anthropic as it seeks to strengthen its coding tools, which bring in a significant share of its revenue.

IN TODAY’S NEWSLETTER

1. OpenAI tops leaderboard of 'Thinking Algorithms'

2. AWS wants to be the AI everything store

3. Mistral AI releases low-cost open-source models

RESEARCH

OpenAI tops leaderboard of 'Thinking Algorithms'

There’s a new AI leaderboard from startup Neurometric that ranks the effectiveness of a specific set of language models powering the current AI boom. The leaderboard focuses on "thinking algorithms," and OpenAI grabbed the top spot with its open weights model GPT-OSS 120B, while the Chinese model DeepSeek R1 was just behind.

But even more consequential, the work from Neurometric reveals several surprises about model performance that could upend the conventional wisdom of businesses launching AI projects. And ultimately, it could result in better performance and/or lower prices for AI workloads.

In an exclusive interview with The Deep View, Neurometric CEO Rob May said, "This leaderboard provides a counterintuitive insight — the idea that model performance varies dramatically on a per-task basis. I don't think people expected it to vary this much, particularly when you couple it with the test-time compute strategies… These small language models are more task-specific, so they typically run faster, perform better, and they're cheaper altogether — which is unheard of compared to just using a giant model."

With Neurometric's focus on applied AI in real-world use cases, it chose CRMArena as the tool to measure the performance of the thinking algorithms. Neurometric ran all its tests on thinking algorithms available on Amazon Bedrock, eliminating additional variables such as network latency and server performance. 

However, it plans to test and measure additional thinking algorithms over time, including ones available outside of Amazon Bedrock.

In a blog post announcing the leaderboard, May wrote, "We’ve seen a trend in companies as they move along the AI maturity curve. While nearly everyone starts out building a prototype on one single model, usually a frontier lab model, as AI products start to scale, it becomes obvious that some workloads are better handled with other models.

Neurometric will launch its first product in early 2026, aimed at helping companies select the right models for their workloads to improve performance, save money, or both. 

Every day, more parts of the generative AI ecosystem are calling into question one of the basic premises of the current AI boom: that large foundation models paired with massive amounts of data and compute will deliver the biggest gains in AI. Ilya Sutskever recently declared the age of scaling over, and that momentum would shift back to research. Mistral AI released new low-cost, open-source models, arguing that the future of AI innovation will depend on smaller models fine-tuned for specific use cases. And so Neurometric's counterintuitive insight that thinking algorithms can vary widely on a per-task basis offers hope that today's AI could become more performant and less expensive, opening up new possibilities for AI projects and improving ROI. 

TOGETHER WITH IBM

How trustworthy is your AI data?

Data breaches are on the rise, and AI agents are making managing these threats more complex because they can act independently. Security and governance teams need to work together—now more than ever. 

To use AI for faster and better results, you need a clear plan that brings together

  • People 

  • Processes 

  • Technology 

Good data governance isn’t about slowing things down, it's about moving fast with guardrails. Learn how securing and governing your data can help you drive ROI from your AI, even as you scale.

BIG TECH

AWS wants to be the AI everything store

Amazon wants to give people a little bit of everything. 

At the company’s annual re:Invent conference in Las Vegas on Tuesday, Amazon Web Services unveiled new offerings at every level of the tech stack, including new chips, models, and agentic tools. 

In between touting its new and upcoming Trainium AI chips and a slew of agentic offerings, Matt Garman, CEO of AWS, announced several new models for Amazon Bedrock in his keynote address, including offerings from OpenAI, Alibaba and Google, and is bringing newly-released models from Mistral AI onto the platform. 

But Amazon didn’t just show off its partner models: it also launched the second generation of its own models, Amazon Nova. The suite includes Lite, a model for fast, cost-effective reasoning; Pro, for complex workloads; Sonic, a speech-to-speech foundation model for “human-like conversation;” and Omni, a unified multimodal reasoning model.

“We never believed that there was going to be one model to rule them all … it's why we've continued to rapidly build upon an already wide selection of models,” said Garman.

If off-the-shelf models aren’t cutting it, Amazon launched a new product called Nova Forge, a system that allows customers to make their own frontier models. Nova Forge gives customers access to a variety of model training “checkpoints,” so that companies can insert their domain-specific data early on in the training process. 

The offering highlights that enterprises are honing in on domain-specific models, Brian Jackson, principal research director at Info-Tech Research Group, told The Deep View. While general use models are helpful tools in a lot of contexts, they’re often “error-prone” in fields that require specific knowledge. 

However, while Nova Forge has big potential to offer enterprises a competitive edge, questions remain about its quality, said Jackson. “How much of the Nova secret sauce are we really getting? How difficult will it be to package a custom dataset that yields good domain-specific performance?” he asked.  

Plus, Nova Forge’s offerings might be an oxymoron, Jackson noted. A frontier model is defined as a general-purpose model that outperforms competing models in conventional performance benchmarks, he said. “By definition, enterprises will be training a version of Nova that is more performant on specific domain knowledge, not a better general-purpose model.”

Amazon’s approach highlights that the company wants to have everything an enterprise customer could want. No one model, AI vendor, agent, or chip can do it all. And as some enterprises start to eye alternatives to traditional cloud services, such as neoclouds, giving customers a bevy of options could be part of Amazon’s strategy to keep its grip on its nearly 30% market share in the cloud market. The more it has to offer, the more differentiated it gets.

TOGETHER WITH MONDAY.COM

AI that does the work for you

AI is everywhere right now. But what actually delivers real business impact?

Enter monday work management — a platform with AI built in, not bolted on, so it understands how your team plans, tracks, and delivers work from day one. See how monday’s AI helps teams get more done.

It organizes tasks. Assigns owners. Flags risks. Prioritizes projects. Even builds dashboards. Not in a separate tool, but in your actual workflow.

The result?
Less chaos. More clarity.
Less busywork. More breakthroughs.
All in a platform people actually love to use.

Used by over 250,000 teams — including more than half the Fortune 500 — monday.com helps you move faster, from first idea to final result.

PRODUCTS

Mistral AI releases low-cost open-source models

Mistral AI released Mistral 3, a new generation of models, it says can achieve closed-source performance with open-source transparency and control.

The Paris-based company debuted a new flagship model, Mistral Large 3, while its Ministral 3 series offers cheaper models for edge and local use cases. With the release, Mistral is doubling down on its belief that the future of enterprise AI lies not in proprietary general-purpose models but in small, open-source models fine-tuned for specific use cases.

While Mistral won’t outperform the likes of OpenAI or Google on benchmark tests, it believes the most broadly performant model isn't necessarily the most valuable for businesses. The Mistral 3 models were released under the Apache 2.0 license, an open-source license that allows users to use or modify software as they wish, including for commercial purposes. Mistral open-sources its models partly out of a commitment to transparency and accessibility, the company said in a blog post, but the decision also gives Mistral a niche in the crowded field of model development.

With an open-source license, a Mistral customer using AI for a specific use case, such as autonomous grocery delivery, could tweak lower-powered models to perform only the grocery delivery task while using many fewer tokens than frontier models. Mistral AI also offers custom model training services to help clients fine-tune and deploy their models. 

Mistral 3’s launch came just one day after DeepSeek debuted DeepSeek-V3.2, another highly regarded open-source model. In its blog post announcing the new Mistral launch, the company showed Mistral Large 3 outperformed DeepSeek-V3.1 on both general and multilingual prompts, but it’s not clear if Mistral’s new flagship model would beat DeepSeek’s newly released model. Mistral didn’t return a request for comment on whether it tested Mistral 3 Large against DeepSeek-V3.2. 

Rather than burning cash to try and stay afloat in the age of scaling, Mistral has taken a more practical approach to model development. Businesses don’t necessarily care about using the most performant general-purpose LLM; they want AI tools that can accomplish specific tasks at as low a price point as possible. Provided that Mistral 3 can do that, it appears that Europe’s most prominent AI lab has found a compelling business case for its models.

LINKS

GAMES

Which image is real?

Login or Subscribe to participate in polls.

POLL RESULTS

Can LLMs scale all the way to AGI?

  • Yes, they just need enough data and compute (22%)

  • No, multiple breakthroughs are needed (57%)

  • Other (share your thoughts) (21%)

The Deep View is written by Nat Rubio-Licht, Jack Kubinec, Jason Hiner, Faris Kojok and The Deep View crew. Please reply with any feedback.

Thanks for reading today’s edition of The Deep View! We’ll see you tomorrow.

“Faces don't look right and what is that hazy mound on the roof?”

“[This image] has asymmetry usual in real life - is less "ideally composed", and has contrast extremes usual for cameras, light dispersion also relatively unnatural in [the other image]”

“The pineapple clock gave it away along with some weird general fuzziness that was supposed to be what, maybe steam?”

“Christmas tree ornaments on the left were lined up perfectly.”

“More people and detail in the building.”

“The other tree seems missing branches & the lights are mismatched, snow pushed between the stands didn’t seem real to me.”

Take The Deep View with you on the go! We’ve got exclusive, in-depth interviews for you on The Deep View: Conversations podcast every Tuesday morning.

If you want to get in front of an audience of 450,000+ developers, business leaders and tech enthusiasts, get in touch with us here.