• Everyday AI
  • Posts
  • Ep 703: AI Hallucinations: What they are, why they happen, and the right way to reduce the risk

Ep 703: AI Hallucinations: What they are, why they happen, and the right way to reduce the risk

Google launches Project Genie for AI Ultra subscribers; Apple acquires AI startup Q.ai for nearly $2 billion; Amazon, Microsoft, and NVIDIA consider a potential $60 billion investment in OpenAI.

In Partnership With Modulate

Modulate: Voice intelligence to understand conversational context

Voice is multi-dimensional and AI models that rely on text to analyze your conversations miss their true meaning. Velma, a voice-native AI leveraging the unique Ensemble Listening Model (ELM) architecture, goes beyond transcripts to analyze emotion, timing, tone, and intent.

Built on 21 billion hours of audio, trusted by Fortune 500s, and more accurate and cost-effective than models like Google and xAI for conversational understanding, Velma is now publicly available so you can get human-level voice intelligence that’s 100x faster, cheaper, and more reliable.

Outsmart The Future

Today in Everyday AI
8 minute read

🎙 Daily Podcast Episode: AI hallucinations remain one of the most misunderstood risks in enterprise adoption. Today we’re cutting through the confusion to explain what’s really happening — and how to fix it. Give today’s show a watch/read/listen.

🕵️‍♂️ Fresh Finds: Gemini for Business begins testing Claude models, Clawdbot/Moltbot gets another new name and ChatGPT rolls out support for 60+ new apps and more. Read on for Fresh Finds.

🗞 Byte Sized Daily AI News: Google rolls out Project Genie to AI Ultra subscribers, Apple buys AI startup Q.ai for nearly $2 billion, OpenAI plans a potential Q4 2026 IPO to stay ahead of Anthropic and more Read on for Byte Sized News.

💪 Leverage AI: If you’re waiting for AI to stop hallucinating, you’re waiting for the wrong fix. The winners in 2026 aren’t trusting models more — they’re building systems that catch mistakes before they matter. Keep reading for that!

↩️ Don’t miss out: Miss our last newsletter? We covered: DeepMind releases AlphaGenome, Microsoft posts stronger-than-expected Q2 earnings with $81.27 billion in revenue, Meta announces plans to spend more than $115 billion on AI infrastructure in 2026, and more. Check it here!

Ep 703: AI Hallucinations: What they are, why they happen, and the right way to reduce the risk

Let's talk about the AI elephant in the room: hallucinations. 🐘

Maybe hallucinations are the reason your company has been hesitant on AI.

But here's the thing, y'all. If you know what you're doing, hallucinations are largely manageable.

But first, you gotta understand what they are, how they happen, and how to reduce the risk.

Also on the pod today:

• AI making up fake citations? 📚
• Custom instructions slash errors ⚙️
• Lawyer sanctions for AI mistakes 👨‍⚖️


It’ll be worth your 34 minutes:

Listen on our site:

Click to listen

Subscribe and listen on your favorite podcast platform

Listen on:

Here’s our favorite AI finds from across the web:

New AI Tool Spotlight – Leapility Turns your repetitive workflows into AI-powered playbooks, Sheetsbase is an AI formula and shortcuts for Google Sheets, Webhound is Long-running research agents for power users.

SunoAI ‘Sample’ Mode — Turn any sound into a hit—instantly create songs from everyday noise.

Moltbot Rebrand — Clawd/Moltbot is now OpenClaw. 100k+ GitHub stars and 2M visitors in a week—will the new name stick?

Project Genie Test — Generates explorable, physics-aware 3D scenes from text or photos.

Google Tests Claude for Gemini — Gemini for Business may soon let enterprises pick third-party models like Claude.

Former OpenAI VP AI — Jerry Tworek’s new startup is hunting $500M–$1B to build continuously learning AI for automation. Big bet could upend static LLMs.

Flapping Airplanes Research — Flapping Airplanes got $180M to ditch brute-force scaling and chase data-efficient AI. Curious if this could upend the scale-first playbook?

New ChatGPT Apps — ChatGPT apps surge—60+ approved this week, turning apps into core features.

AI In Gaming — Over half of devs say generative AI harms games. Still, 36% use it mainly for research. Curious?

1. OpenAI set for Q4 IPO as retail excitement builds around AI 🚨

OpenAI plans to go public in Q4, and retail investors are already crowding the conversation as excitement mounts around the company’s potential market impact.

The timing makes this quarter pivotal: a successful IPO would channel massive capital into the AI sector and reshape investor allocations toward startups and incumbents tied to generative AI. Retail chatter and sentiment indicators show heightened interest, but analysts caution that valuation, regulatory scrutiny, and long‑term monetization remain key watchpoints.

2. Google opens Project Genie to AI Ultra subscribers in the U.S. 🧞‍♂️

Starting today, Google is rolling out Project Genie to Google AI Ultra subscribers in the U.S., giving adults early access to a web prototype that generates and lets users explore interactive, navigable worlds in real time.

The prototype is powered by Genie 3 (with Nano Banana Pro and Gemini), supports world creation from text and images, and offers remixing plus downloadable exploration videos, but it currently limits generations to 60 seconds and has realism and character-control shortcomings. Google frames this as a research step toward broader world models that simulate dynamic environments for uses from robotics to storytelling, while warning the system is experimental and will improve over time.

3. Apple buys Israeli AI startup Q.ai for nearly $2B 🛰️

According to the Financial Times, Apple has acquired Israel-based Q.ai in a deal valued at close to $2 billion, making it likely the company’s second-largest purchase after Beats.

Apple confirmed the acquisition through Johny Srouji, who praised Q.ai’s imaging and machine-learning work, and Reuters reports the startup’s founding team, including CEO Aviad Maizels, will join Apple. Q.ai, founded in 2022 and long secretive, builds tech for "silent" voice input and micro facial-movement interpretation, which could be folded into AirPods, Apple Watch, Vision Pro, or other input systems.

4. AI may speed coding but can stunt learning, study finds 🔬

A new randomized trial from Anthropic researchers reports that junior developers using AI coding assistants scored 17% lower on a follow-up quiz about a new Python library than those who coded by hand, suggesting AI can hinder short-term mastery even while slightly speeding task completion.

The study shows the harm is not inevitable: developers who used AI to ask conceptual questions or to get explanations retained more knowledge, while those who delegated coding or used AI primarily to debug showed the largest drops in understanding. The findings matter now because AI coding tools are becoming standard in workplaces, raising a trade-off between immediate productivity gains and the long-term ability to spot and fix errors in AI-written code.

5. OpenAI retires several older ChatGPT models, including GPT‑4o 👴

OpenAI announced the forthcoming retirement of GPT‑4o, GPT‑4.1, GPT‑4.1 mini and o4‑mini from ChatGPT on February 13, 2026, citing that most users have already moved to GPT‑5.2 and that improvements from GPT‑4o’s feedback are now baked into newer models.

The company says GPT‑5.1 and GPT‑5.2 incorporate GPT‑4o’s conversational warmth and creative strengths while offering more control over tone and personality, and no immediate API changes were announced. OpenAI framed the move as a shift to focus development on the models people actually use, while promising further updates to reduce unnecessary refusals and better tailor responses for adult users.

6. Big tech eyes massive OpenAI stake in deal talks 👀

According to recent reports, Amazon, Microsoft and Nvidia are negotiating term sheets to invest up to $60 billion into OpenAI, a move that could help fund the company’s steep infrastructure bills and would imply a valuation near $830 billion.

The potential deals signal each partner’s strategic aim: AWS could secure hosting and distribution, Nvidia would lock in GPU demand, and Microsoft would deepen its Azure and product ties. Despite OpenAI’s reported $20 billion annualized revenue run rate, heavy losses from training and serving large models explain why fresh capital and cloud cost deals are now critical.

Stop waiting for a "patch" to fix AI lying.

It isn't coming.

The very creativity that makes a model brilliant at strategy is exactly what allows it to fabricate a billion-dollar mistake.

On today’s show, we un-stickied the most polarizing topic in the enterprise: why AI still hallucinates in 2026 and why most leaders are fighting it the wrong way. We aren't just talking about "bad data." We are talking about the "Helpful Assistant Trap" and the specific point in a conversation where a model’s IQ effectively drops by half.

If you don't understand the four-layer defense we broke down in Volume 5 of our Start Here Series, your AI strategy is essentially a house of cards.

You can either live in fear of the "hallucination elephant," or you can learn the verification workflows that turn a lying chatbot into a bulletproof expert.

1. Solving The 3 P.M. Brain Fog 🧠

Most leaders think AI hallucinations are random.

They aren't.

Early models suffered from a literal "brain fog" that occurred as conversations got longer. Think of your workday. At 9 a.m., you are sharp. By 3 p.m., you are tired, forgetful, and prone to making things up just to finish the task.

Large language models functioned exactly the same way within their context windows.

But frontier models like GPT-5.2 and Gemini 3 Pro have fundamentally cured this cognitive decline. In the "four needle test," legacy systems saw their recall accuracy plummet to less than 50% during long tasks. Today’s reasoning models stay at 95% accuracy even later in the post-3 pm context window.

The game has shifted from pattern matching to active, persistent reasoning.

Try This: Audit your team's workflow right now and identify anyone using "mini" or legacy models for deep research. You are effectively asking a drained, three p.m. version of an AI to handle your most complex strategy. Transition these high-value tasks to reasoning models that utilize "chain of thought" processing immediately. When you force the AI to "think" before it speaks, you can actually see the logic unfold. This allows you to spot a hallucination in the reasoning phase before it ever reaches a final report.

2. The Liability Of Being Too Helpful ⚖️

Your AI lies to you because it is too polite.

Every major model is programmed with a system prompt to be a "helpful assistant." This is a catastrophic liability for an enterprise. When a model doesn't know the answer, its "helpfulness" override kicks in, and it fabricates a response rather than admitting ignorance.

The fallout is no longer theoretical.

From the HEC Paris database tracking 486 legal cases to Deloitte reportedly refunding a $300,000 contract for "phantom citations," the cost of AI politeness is real. In 2026, a "helpful" AI is a dangerous AI.

Try This: Update your organization’s custom instructions today to kill the "people-pleasing" default. Require every user to include a rule that forces the model to separate confirmed facts from inferences. Have the model provide a "confidence score" for every market claim it makes. When a model is allowed to be uncertain, it stops filling gaps with hallucinations. You’ll find that a model with a "70% confidence" warning is ten times more valuable than one that lies to you with total authority.

3. Designing Expert-Driven Loops 🛡️

If your team is "prompting and praying," you don't have an AI strategy.

You have a gambling habit. Lolz. 

The winners in this era aren't just using better models; they are building verification fortresses. A 2024 Stanford study showed that combining company data with specific human guardrails reduced hallucinations by 96%.

This is the "Expert-Driven Loop."

It means first grounding the model in your OneDrive or SharePoint or Google Drive data and then running a "Second Pass Review.” with a second AI model before your human review. You should never use the same model to both write a report and fact-check it. Instead, use a different reasoning engine to act as a cynical editor.

After a second pass AI has had its shot, then its time for expert human review. 

Treat the AI like a junior employee. You wouldn't let a new hire send a document to a client without a senior partner's eyes on it. Stop doing it with your models.

Try This: Run a "Second Pass" protocol on your next high-stakes internal document. Take the AI's output and feed it into a different model with a simple instruction: "Act as an adversarial fact-checker. Identify every claim and verify it against these specific source files. Highlight anything that isn't explicitly supported." This adversarial setup catches the lies that a single pass always misses. It moves your team from blind trust to a rigorous, multi-layered verification system that ensures your AI outputs are actually publishable.

Reply

or to participate.