- Everyday AI
- Posts
- Ep 662: Opus 4.5: New king of the AI hill or just a niche model for coders?
Ep 662: Opus 4.5: New king of the AI hill or just a niche model for coders?
Our hands-on with Claude Opus 4.5, OpenAI teases app store, Gemini introduces interactive images, Google escalates its AI chip war with Nvidia and more.
Sup yâall đ
Thankful for all your support over the years for this wild AI ride called Everyday AI.
Appreciate all the messages of support like what Cecilia dropped in todayâs livestream.
Our small team will be taking the next two days off to celebrate the Thanksgiving holiday and spend time with family.
See ya Monday and Iâm thankful for all of you.
âď¸
Jordan
Outsmart The Future
Today in Everyday AI
8 minute read
đ Daily Podcast Episode: Another âbest AI modelâ just dropped. This time itâs Claude Opus 4.5. We gave it the live treatment for AI at Work on Wednesdays. Find out more in todayâs show and give it a watch/listen.
đľď¸ââď¸ Fresh Finds: ChatGPTâs big live update, Interactive Images on Gemini, Perplexity Shopping Assistant upgrades and more. Read on for Fresh Finds.
đ Byte Sized Daily AI News: OpenAâsI new App Store leaks, Nvidia Chips Vs Google Chips, OpenAI Hardware Prototypes and more. Read on for Byte Sized News.
đ§ Learn & Leveraging AI: Anthropic says Opus 4.5 is the best model in the world. We tested it liveâand it did not go well. Yikes. Keep reading for that!
âŠď¸ Donât miss out: How to fight AI sprawl, Anthropic surprises with Claude Opus 4.5, Meta and Google partner on AI chips, White House's shocking AI announcement and more. Check it here!
Ep 662: Opus 4.5: New king of the AI hill or just a niche model for coders?
"... best model in the world..." đ¤
Wait, again?
Days after Gemini 3 Pro splashed on the scene, Anthropic snuck in a low-key drop in Claude Opus 4.5.
And Anthropic pulled no punches, calling its new model the "best model in the world for coding, agents and computer use"
So, should you be hot swapping your Gemini or ChatGPT use out for the new Opus 4.5? Or, is this model more of a niche for software devs?
Tune in, as we put AI to Work on Wednesday!
Opus 4.5: New king of the AI hill or just a niche model for coders?
P.S.... we're out for Thanksgiving. So after this show, we'll see ya Monday!
Also on the pod today:
⢠Opus 4.5: coderâs new favorite? đ¨âđť
⢠Anthropicâs massive API price cut đ¸
⢠Visualization: Excel & Slides improved đ
Itâll be worth your 46 minutes:
Listen on our site:
Subscribe and listen on your favorite podcast platform
Listen on:
Hereâs our favorite AI finds from across the web:
New AI Tool Spotlight â Questas builds choose-your-own-adventure stories with AI-generated images and videos. Every choice leads to a new path, FireCut is a lightning-fast AI video editor, AskCodi helps you to Build your own Coding Models on top of any LLM
AI Music â Warner strikes licensed AI deal with Suno, letting opted-in artistsâ voices be generated.
AI Tech â Nota AI powers Samsungâs Exynos 2500 to run generative AI fully on-device.
ChatGPT Live Chat â Inline voice streams live replies â now see replies typed as you speak.
AI Thanksgiving Talk â AI joins politics and football at the Thanksgiving argument table â what to know.
Gemini Interactive Images â Tap diagrams to unlock interactive explanations for deeper, visual learning
Perplexity Shopping Assistant â Perplexity launches PayPal-powered AI shopping with memory-driven recommendations.
Gemini Image Reimagine â Turn any photo into new scenes â Google Messages' Remix uses Nano Banana Gemini.
ChatGPT Image Generator â ChatGPT may be rolling out a faster, style-driven ImageGenV2 for richer, consistent edits
1. ChatGPT readies App Store like platform for apps and workflows âď¸
OpenAI is preparing a timed rollout of developer features that would let creators publish Agent Builder workflows directly into ChatGPT, add image-generation as a tool inside workflow building, and launch an Apps dashboard to manage and integrate apps with ChatGPT.
The moves speed up how developers bring custom agents and visual tools into ChatGPT, reducing friction for discovery and distribution and hinting at a broader December product push. Key details remain undecided, such as whether workflows will be shareable and when features will reach all users, with early access now limited to select partners.
2. Nvidia pushes back as Google TPU momentum grows đ
Nvidia responded Tuesday to Wall Street concerns after its shares dipped following reports Meta may use Googleâs TPUs, saying its Blackwell GPUs remain a generation ahead and run every AI model everywhere computing happens. The company argued its GPUs offer greater flexibility and performance than Googleâs ASIC-style TPUs, while noting Google remains a customer and that Gemini can run on Nvidia hardware.
Google says demand is rising for both its custom TPUs and Nvidia GPUs as it supports customers via Google Cloud, and recent advances like Gemini 3 have boosted attention on TPU-based training.
3. OpenAI unveils first hardware prototypes, says Sam Altman đ¤
Sam Altman revealed at Sun Valley that OpenAI has completed its first hardware prototypes, calling the work âjaw droppingâ and signaling a major step toward new AI devices following its $6.4 billion acquisition of Jony Iveâs io.
He offered no product details but said the goal is a calmer, long-duration assistant that filters information and âknows everything youâve ever thought about, read, said,â positioning the device as a quieter alternative to todayâs attention-grabbing smartphones.
4. North Koreaâs AI advances pose new surveillance and cyber risks đ§âđť
A new South Korean analysis says recent 2025 North Korean research papers reveal rapid AI improvements that could bolster surveillance, voice impersonation and automated cybercrime, making the findings timely given Pyongyangâs push to field AI-equipped unmanned systems.
The Institute for National Security Strategy warns facial recognition and multi-object tracking research could strengthen monitoring at military sites and borders, while lightweight speech synthesis enables near-real-time voice impersonation for psychological operations. The report also flags AI-driven automation of reconnaissance, phishing and money-laundering as a force multiplier for cryptocurrency theft and social engineering.
5. OpenAI says ChatGPT subscriptions could swell to 220 million by 2030 đ¸
According to The Information, OpenAI now projects roughly 8.5% of its future weekly user base â about 220 million people by 2030 â will pay for ChatGPT, a sharp increase from about 35 million paid users in July. The report also says OpenAIâs annualized revenue run rate could hit $20 billion this year even as the company burns cash on R&D and operations.
Management plans to diversify revenue with shopping and ad-driven features, including a new personal shopping assistant that could unlock commissions and advertising. If these forecasts hold, ChatGPT would rank among the worldâs largest subscription businesses while still navigating significant costs and profit pressure.
Anthropic just claimed they built the "best model in the world" with their new Opus 4.5 and dropped the mic.
But when we actually put Opus 4.5 to the test live on air, the mic didn't just dropâit broke.
Yikes.
In our testing on todayâs show, the model hallucinated tools that don't exist, failed basic instruction following, and crashed its own context window repeatedly.
We walked through Anthropicâs benchmarks, outside leaderboards, and the companyâs kinda quiet pivot and three sneaky good features they rolled out this week.
Hereâs the execâlevel version: how Anthropicâs vertical pivot and price cut change your stack, where Opus 4.5 is actually dangerous for competitors, and when Gemini or GPT should still be your default. Youâll get Mondayâready moves, not another hype recap.
Time to capitalize. Letâs get it.
1. Kill the âBest Modelâ Myth đĽ
We just watched a familiar playbook: a vendor cherryâpicks benchmarks, calls its model the top option, and hopes nobody reads the fine print or notices the difference. Anthropic framed Opus 4.5 as the leading choice for coding agents and computer use.
But when you step back to aggregated benchmarks, Gemini 3 Pro leads on overall coding and intelligence scores, and Opus 4.5 is effectively tied with GPTâ5.1 High, with GPTâ5.1 Pro not even in the race yet from an API testing perspective.
The real advantage goes to teams that treat models like interchangeable parts instead of religion. If your stack still assumes one model for everything, youâre handing speed and margin to competitors who swap tools per workflow. Benchmarks still matter, but only as directional signals, not gospel. And those league tables are shifting weekly as new variants keep dropping.
Try This
Hook up your three main models â Gemini 3 Pro, Claude Opus 4.5, and whatever GPTâ5.1 tier you have â into one internal playground.
Take five core workflows: weekly reporting, customer replies, code review, spreadsheet analysis, and slide outlines. Run the same prompt through each model and log accuracy, latency, and cleanup time.
Rank them per task and set a routing rule: âIf X, use Y model.â Lock this in as a quarterly calibration ritual, not a oneâoff bakeâoff. Store results in a dashboard.
2. Exploit Claudeâs Vertical Pivot âĄ
Under the noise, Anthropic is making a very specific bet: stop trying to be the everythingâapp and dominate a couple of highâvalue verticals.
The clues are everywhere.
Opus 4.5 is tuned for software engineering, agentic workflows, and heavy data/finance work. Claude for Excel is now generally available for higherâtier users, built to chew through thousandsârow spreadsheets. File creation quietly turns prompts into polished decks and sheets.
Pair that with a twoâthirds API price cut and cheaper compute from Amazon and Google, and you see the shape of the play: become the default âwork AIâ inside codebases and spreadsheets while others chase consumer dazzle. If your business lives in IDEs and Excel, ignoring this shift is leaving productivity on the table.
Try This
Pick one domain: engineering or spreadsheets.
For engineering, plug Claude into a sandbox repo and feed it a week of âcode copilotâ workâbug triage, tests, and small refactors. For spreadsheets, pilot Claude for Excel on one messy reporting workbook with thousands of rows.
In each case, document what it nails and where it fails. Then redraw one teamâs RACI: humans own design and judgment; Claude owns boilerplate, data cleanup, and firstâpass analysis. End the week with a 30âminute go/noâgo decision.
3. Make Reliability Your Real Benchmark đ
Brutal.
In our very limited live demos, Opus 4.5 still behaved like an overconfident intern. Long, complex tasks blew past its context window and it dropped every plate that it tried to spin in the air.
It ignored a very specific URL instruction and grabbed the wrong content. A multiâstep slideâdeck refresh died midâflow despite careful scoping. An artifactsâpowered dashboard never rendered while Gemini 3 Pro produced a slick, SaaSâlevel analytics app on the same brief.
That gap really matters for execution.
Again â take our lil demos with a huge grain of salt. Even though Opus 4.5 kinda bombed the limited tests we gave them live, the model still has an insanely high ceiling. (For when it works, that is.)
Know that the winning teams wonât be the ones with the fanciest agentic marketing slide. Theyâll be the ones who ruthlessly test where models fall apart, read the traces, and design workflows around those failure modes.
Treat every âextended contextâ claim as a hypothesis to break, not a promise to trust. When in doubt, break work into smaller hops and orchestrate the agents instead of worshiping a single giant prompt.
Try This
Pick one ugly workflow that already stalls your team: updating decks, merging docs, or turning CSVs into dashboards.
Run it endâtoâend in three variants: Gemini 3 Pro, Opus 4.5, and your current default. Donât just judge outputs; log every failure. Check the context blowups, ignored instructions, tool use, timeouts.
Turn that into a oneâpage playbook: which model to use, max input size, when to split tasks, and where humans must stay in the loop. Reârun the test whenever a âbest modelâ headline drops.







Reply