- Everyday AI
- Posts
- OpenAI o3 and o4 Unlocked: Inside the newest, most powerful AI models
OpenAI o3 and o4 Unlocked: Inside the newest, most powerful AI models
Inside o3 and o4-mini models, Google's new AI phase, Anthropic to have AI employees, and more!
👉 Subscribe Here | 🗣 Hire Us To Speak | 🤝 Partner with Us | 🤖 Grow with GenAI
Outsmart The Future
Today in Everyday AI
6 minute read
🎙 Daily Podcast Episode: Explore OpenAI's groundbreaking o3 and o4-mini models, now dominating the AI landscape. How do they stack up to Gemini 2.5 Pro? We dive in. Give it a listen.
🕵️‍♂️ Fresh Finds: Google DeepMind CEO speaks on AI future, AI films can now win Oscars and Descript unveils agentic video editor. Read on for Fresh Finds.
🗞 Byte Sized Daily AI News: Google unveils “Era of Experience” AI phase, Trump administration proposes AI in K-12 and Anthropic predicts AI virtual employees within the year. For that and more, read on for Byte Sized News.
🧠Learn & Leveraging AI: OpenAI’s new o3 and o4 models are powerful. But does that make them the best? Here’s everything you need to know. Keep reading for that!
↩️ Don’t miss out: Did you miss our last newsletter? We talked about OpenAI’s o3 benchmark discrepancies, Google DeepMind demos Astra AI and ChatGPT’s memory getting web search. Check it here!
OpenAI o3 and o4 Unlocked: Inside the newest, most powerful AI models️ 💡
OpenAI's newest models are already topping the charts.
It's almost like another week, another chart-topping AI release from a big player.
So what's different about OpenAI's new o3 (and o4 mini)?
And is it really better than Google's super impressive Gemini 2.5 Pro?
Also on the pod today:
• Agentic AI: o3's Tool Chaining 🔗
• o3 Versus o4 Model Comparison 🤔
• OpenAI's Image and Python Capabilities 🧑‍💻️
It’ll be worth your 49 minutes:
Listen on our site:
Subscribe and listen on your favorite podcast platform
Listen on:
Spotify | Apple Podcasts |
Youtube| Amazon Music |
Upcoming Episodes
Here’s our favorite AI finds from across the web:
New AI Tool Spotlight – Strawberry is an AI browser that boosts your daily workflow, Corgea is an AI security code reviewer and AgentR is an AI hiring assistant.
Google – Google DeepMind’s CEO believes AI could end disease and lead to “radical abundance.”
AI in Media – AI films can now win Oscars according to the Academy.
AI Video – Descript has unveiled its new agentic video editor.
Hello! At @DescriptApp we are bought in on AI agents - very pleased to announce something we've been working on for awhile: Our agentic video editor - i.e. cursor for video. Sign up to try it out at descript.com/agent
— Andrew Mason (@andrewmason)
3:39 PM • Apr 22, 2025
AI Tech - EV maker Rivivan has elected Cohere’s CEO to its board as it pushes for more AI integration.
AI models – Experts are saying that crowdsourced AI benchmarks have major flaws.
AI Startups – Two undergrads have built an AI speech model to rival NotebookLM.
1. Google Announces New Phase in AI Development 🚀
Google has unveiled its vision for the next leap in AI with the “Era of Experience,” where AI agents create their own training data through real-world interactions. This approach aims to move beyond the current “Human Data Era” and the prior “Simulation Era,” tackling data limitations that have slowed progress toward artificial general intelligence (AGI).
By enabling AI to self-generate learning data, Google hopes to accelerate innovation and build smarter, more autonomous systems. This could mark a significant strategic shift that could help Google in its AI battle with OpenAI.
2. Trump Administration Eyes AI in K-12 Education 🧑‍🏫️
The Trump administration is reportedly considering an executive order to integrate AI training and tools into K-12 classrooms, aiming to prepare students for a future driven by artificial intelligence, according to a draft obtained by The Washington Post. This move would push federal agencies to collaborate with private sectors to develop AI education programs, signaling a strong federal interest in maintaining U.S. leadership in AI innovation.
Experts emphasize that this shift could reshape how students learn, blending traditional subjects with computational skills and critical thinking about AI’s capabilities and limits.
3. Anthropic Predicts AI Virtual Employees Within a Year 🤖
Anthropic’s top security officer warns that AI-powered virtual employees, equipped with their own roles, memories, and corporate accounts, are expected to start operating within companies by next year. This shift will force organizations to rethink cybersecurity, as managing AI identities introduces new risks like rogue AI hacking and unclear accountability.
The company is already testing its Claude models for cyber resilience and monitoring potential abuse, highlighting how critical AI security will become.
4. ChatGPT Search Surges in Europe, Nearing Regulatory Threshold 🇪🇺
ChatGPT’s web-integrated search feature saw its European monthly active users jump from 11.2 million to 41.3 million in just six months, according to OpenAI Ireland Limited’s latest report. This rapid growth puts ChatGPT search on the cusp of EU Digital Services Act rules that demand transparency and user control for platforms exceeding 45 million active recipients.
While Google still dominates search by a massive margin, OpenAI’s tool is gaining traction—though studies warn it struggles with accuracy and reliability.
5. Google’s Gemini Deal with Samsung Surfaces Amid Antitrust Hearings ⚖️
New testimony in Google's ongoing antitrust trial reveals the tech giant paid Samsung a hefty monthly fee to pre-install its Gemini AI app on Galaxy devices, securing prime placement over Samsung’s own AI efforts. The contract spans at least two years, with Google also sharing ad revenue from user interactions within Gemini.
This deal highlights how Google leverages financial power to entrench its AI products, raising concerns about fair competition in the tech space. With a final ruling expected in September 2025, this case could reshape how dominant companies influence device ecosystems and innovation.
6. AWS Faces Heat Over AI Usage Caps on Anthropic Models 🤔
AWS is under scrutiny after reports from The Information reveal customers hitting frustrating rate limits using Anthropic’s AI models via AWS Bedrock, with some calling the caps “arbitrary.”
While some users complain of frequent error messages, AWS insists these limits are designed to ensure fair access, not due to server shortages. This controversy arrives amid Amazon’s massive $8 billion investment in Anthropic and its aggressive $26 billion AI spending plan for 2024-25.
🦾How You Can Leverage:
OpenAI just launched an AI that decides—completely on its own—when to research, code, analyze images, or change tactics mid-conversation.
No prompting required.
No hand-holding needed.
Whoa.
Autonomous LLMs are officially here with OpenAI’s new o3 flagship thinking model. So for today’s episode of Everyday AI, we gave this reasoning model with an arsenal of tools under its belt a full spec run-through.
Our conclusion?
Is it always the best? No.
Is it the most flexible? No.
But sheer, raw power to complete the most complex tasks with minimal human oversight? o3 is THAT model.
Make sure to dive in to our full episode here, and join us tomorrow for Part 2 as we go over some common use-cases live.
If you want the takeaways, let’s get after it shorties. 👇
1 – The model naming will make your brain hurt 🧠\
Along with the full o3, OpenAI also released tiny versions of its next thinking model, o4, in o4-mini and o4-mini-high.

Ready for a brain teaser?
o3 full is BETTER than the o4 variants. Yep.
The lower number is the superior model. Even OpenAI admitted they need to fix this naming disaster.
We tested both models extensively over the past week. When it comes to raw reasoning power, o3 full makes every other model look like a toy.
On third-party benchmarks, o3 dominates with an 81.5 score on LiveBench while Gemini 2.5 Pro trails at 77.4.
But that's just the appetizer to this five-course AI feast that's about to completely transform how you think about what's possible with artificial intelligence in your business and daily workflow.
Try This
Ignore the model names completely. Focus on capabilities instead.
We recommend creating a simple spreadsheet tracking which models excel at different tasks based on your actual results. After just a week of testing, we discovered o3 crushes complex reasoning tasks while Gemini 2.5 Pro still wins for creative writing and quick responses.
2 – It’s the First Truly Agentic Consumer LLM 🤖
This is the BIG switch up in model function no one’s talking about.
Most AIs follow your instructions in order, or maybe call a tool sequentially.
This bad boy actively changes its approach based on what it discovers given the full suite of tools it uses.
In OpenAI’s example, o3 analyzed a blurry photo of ships, decide ON ITS OWN to zoom in for clarity, then queried multiple websites search vessel databases to identify the ships.
And in between all of that, it did some cropping and zooming of the photo with python. On its own.
When it hit a dead end, it pivoted strategies. No prompting needed.
The accuracy jump is INSANE. From a pathetic 1.9% with GPT-4o with web on complex visual-browsing tasks to nearly 50% when o3 has its full tool belt.
Think about that.
It's not just 25x better. It's the difference between "completely useless" and "actually helpful" for tasks that require multiple specialized skills working together in perfect harmony like a well-rehearsed orchestra playing a symphony of problem-solving that would make even the most seasoned analysts weep with joy.
Try This
Give o3 an impossible-sounding task that would normally require 3+ different specialists.
Upload a screenshot of competitor pricing pages and simply ask: "How should we price our product compared to these competitors?"
Watch in awe as it zooms in on pricing details, searches for current market rates across multiple sites, analyzes positioning, and delivers strategic recommendations WITHOUT you having to break the task into steps.
3 – The Hidden “Thinking” Advantage 🤔
Traditional transformers (old-school GPT models) that respond instantly but can't reason deeply.
Thinking models (O-series) that use step-by-step reasoning under the hood.
Hybrid models (like Gemini 2.5 Pro and kinda Claude 3.7 Sonnet) that blend both approaches.
We've tested o3 against incredibly complex tasks that would normally require multiple specialists and hours of work. Tasks that would make other models hallucinate wildly.
o3 handled them flawlessly.
But for quick, creative tasks?
We still reach for Gemini 2.5 Pro.
It's faster and often more flexible for everyday use. The key insight we discovered after extensive testing: o3 isn't meant to replace your entire AI stack – it's designed to handle the complex problems that previously required multiple specialists coordinating their efforts over several hours or even days of intense collaboration.
Try This
Save your precious 50 weekly o3 messages for the HARD stuff.
Upload multiple quarterly reports from your industry and ask for a comprehensive analysis of emerging trends complete with visualizations.
This is exactly the type of multi-layered task where o3 shines brightest – it'll read the documents, extract meaningful patterns, search for supporting data online, create visualizations, and synthesize strategic recommendations that would take a human analyst team days to produce.
Tomorrow: We're putting o3 through real-world tests based on YOUR suggestions. Hit reply with what you want us to try!
Reply