• Everyday AI
  • Posts
  • Inside Multi-Agentic AI: 3 Critical Risks and How to Navigate Them

Inside Multi-Agentic AI: 3 Critical Risks and How to Navigate Them

Google Workspace gets Gemini Gems, OpenAI powers up with Oracle data centers, EU delays AI compliance code and more!

Outsmart The Future

Sup Y’all 👋

We’ll be off tomorrow for the 4th of July holiday here in the U.S.

Hope you all have a great 4th and weekend. See ya Monday!

✌️
Jordan

Today in Everyday AI
6 minute read

🎙 Daily Podcast Episode: A Microsoft leader breaks down multi-agentic systems, governance strategies, and human-AI collaboration to help tackle responsible agentic AI. Give it a listen.

🕵️‍♂️ Fresh Finds: xAI gets 15 natural gas generators, ChatGPT news referrals are growing and Cohere gets new office in Montreal. Read on for Fresh Finds.

🗞 Byte Sized Daily AI News: Google Workspace gets Gemini Gems, OpenAI powers up with Oracle data centers and EU delays AI compliance code. For that and more, read on for Byte Sized News.

🧠 Learn & Leveraging AI: Looking to harness the power of agentic AI? We break down what a Microsoft leader had to say about how you can do so responsibly. Keep reading for that!

↩️ Don’t miss out: Did you miss our last newsletter? We talked about Google’s AI Search cutting news site traffic, U.S. chipmakers getting bigger tax breaks, Microsoft cutting more workforce for AI and more. Check it here!

 Inside Multi-Agentic AI: 3 Critical Risks and How to Navigate Them ⚠️

Multi-agentic AI is rewriting the future of work.... but are we racing ahead without checking for warning signs?

Microsoft’s new agent systems can split up work, make choices, and act on their own. The possibilities? Massive.

But it's not without risks, which is why you NEED to listen to Sarah Bird.

She's the Chief Product Officer of Responsible AI at Microsoft and is constantly building out safer agentic AI.

So what’s really at stake when AIs start making decisions together? And how do you actually stay in control?

Also on the pod today:

• Agentic AI's Ethical Implications ⚖️
• Microsoft’s AI Governance Strategies 🧠
Agentic AI: Future Workforce Skills 💼

It’ll be worth your 31 minutes:

Listen on our site:

Click to listen

Subscribe and listen on your favorite podcast platform

Listen on:

Here’s our favorite AI finds from across the web:

New AI Tool Spotlight – Odyssey creates interactive AI videos, Ainee is an AI-driven note taking companion and OpenMemory MCP is memory for your AI tools.

xAI – xAI is getting permits for 15 natural gas generators at the Memphis data center.

Trending in AI – ChatGPT referrals to news sites are growing but it’s not enough to offset search declines.

AI Chips – CoreWeave is getting new high-end NVIDIA AI chips from Dell.

Cohere – Cohere has announced a new office in Montreal.

1. Google Workspace Gets Smarter with Gemini “Gems” 💎

Google has just rolled out a major upgrade to Workspace by integrating customizable Gemini AI assistants called “Gems” directly into Docs, Slides, Sheets, Drive, and Gmail side panels, eliminating the need to switch apps. These Gems can be tailored to specific tasks like copywriting, sales pitches, or exam revisions, and users can upload their own files for personalized context.

While custom Gems must still be created outside Workspace, the seamless access inside core apps promises to boost productivity by cutting down repetitive prompts and tailoring AI help to individual work roles.

2. OpenAI Powers Up with Oracle’s Gigantic Cloud Deal ⚡

OpenAI is set to rent a staggering 4.5 gigawatts of data center power from Oracle as part of its ambitious Stargate initiative, marking one of the largest AI infrastructure deals ever. This massive energy commitment—enough to power millions of homes—underscores the skyrocketing demand for AI computing muscle necessary to fuel next-gen products.

Oracle, riding high on this deal rumored to generate $30 billion annually by 2028, is rapidly expanding data centers across multiple U.S. states to support OpenAI’s growth.

3. EU Delays Key AI Compliance Code Amid Industry Pressure ⏳

The European Commission has pushed back the release of the General-Purpose AI Code of Practice, originally set for May 2, possibly until the end of 2025, Reuters reports. This code is crucial for companies to meet the EU’s landmark AI Act standards, which aim to make AI systems safer and more transparent across Europe.

The delay comes as tech firms lobby for more time, citing the absence of this guiding document as a major hurdle.

4. Google Launches Veo 3 Video AI Globally 🌏

Google has officially rolled out its Veo 3 video generation model to paying Gemini AI Pro subscribers across 159 countries, allowing users to create up to three 8-second videos daily from text prompts. This marks a notable step in making AI-driven video creation more accessible worldwide, though still limited to subscribers.

This update follows Google’s earlier showcase of Veo 3 and hints at future features like image-to-video generation.

5. OpenAI Slams Robinhood’s “Tokenized Shares” Move 🗣

OpenAI has officially distanced itself from Robinhood’s recent sale of “OpenAI tokens,” clarifying these do not represent actual equity or stock in the company, and that no approval was given for such transfers (TechCrunch).

Robinhood claims these tokens offer indirect exposure to private company shares via a special purpose vehicle (SPV), but OpenAI warns consumers to be cautious as token holders don’t own real shares.

🦾How You Can Leverage:

Microsoft just gave agents their own corporate IDs.

No, seriously.

While you were debating whether ChatGPT could replace your intern, Sarah Bird and team were busy building an entire identity management system for AI agents. 

Because apparently 81% of companies plan to deploy these digital workers in the next 18 months.

And they're gonna need badges.

Sarah is Microsoft's Chief Product Officer of Responsible AI and she joined the Everyday AI show today to help us understand the freshly rewritten playbook that is Responsible AI. 

Why? 

Well, Responsible AI was a lot more straightforward 2.5 years ago when we just had a one-on-one chat with an AI chatbot. 

But now? 

Microsoft’s Copilot, as an example, has options for multiple AI agents to work together, divvy up work, and finish it all on your behalf. 

For real.

That’s why we chatted with Sarah on today’s episode of Everyday AI — because the rules of Responsible AI are changing REAL QUICK and business leaders gotta keep up. 

Here’s what ya need to know. 👇

1 – Your agents are becoming digital employees (with actual IDs) 🪪

Sarah dropped some knowledge that made us pause the 1990s Super Nintendo.

(Yes, that reference lands perfectly after today’s convo.)

See, everyone's treating agents like fancy chatbots, but Microsoft just started giving them actual Entra IDs—the same identity system they use for human employees.

Sarah explained how agents are this weird new entity that's not quite a user, not quite an application, but something totally different that needs its own governance structure.

Think about what this actually means for your organization.

Your agent can access customer data. Financial systems. Internal communications. And here's the part that should make you sweat: 

Sarah mentioned these agents will work for hours without any human checking in, just doing their thing, making decisions, accessing systems, coordinating with other agents like some kind of digital Ocean's Eleven crew.

That's not a tool.

That's an employee.

And most companies? They're still treating agent security like it's 2022—no identity management, no access controls, no governance framework. 

Just vibes and prayers. Lolz. 

Microsoft saw this coming and built agent IDs directly into Entra, connected it to Defender for threat monitoring, and made sure when developers build agents in Foundry or Copilot, the identity gets attached automatically. 

No manual process. No forgotten steps. Just smart infrastructure that treats agents like the digital workforce members they're becoming.

Try this: 

Stop what you're doing and open your org chart right now. 

Add a new box called "Digital Workforce" and list every system your entry-level employees can access—that's your starting point for agent permissions. Create three access tiers: basic (read-only data), intermediate (can modify non-critical systems), and advanced (financial/customer data access). 

The key is assigning every future agent to a tier BEFORE you build it, not after it's already loose in your systems doing who knows what.

2 – The testing catastrophe heading straight for your deployment 😳

Teams spend months building their perfect agent system, getting hyped about all the time it'll save, crafting beautiful demos for leadership. They reach the finish line, ready to ship this bad boy into production. 

Then someone in the back of the room raises their hand: "So... what happens if it does something weird?"

Cricket sounds.

Panic.

Yikes. 

Here's what blew our minds about what Sarah shared: testing agents isn't remotely like testing traditional software where you check if the login button navigates to the right page. 

You're testing whether the agent understands user intent, whether it picks the right tool from its toolkit, whether it stays on task when working solo for three hours straight without drifting into some bizarre tangent about medieval farming techniques.

Microsoft built specific evaluators in Foundry just for this madness. They test for copyright violations (because no one wants that). They test for prompt injection vulnerabilities (because hackers are creative). They test whether the agent can be tricked into leaking sensitive data (because social engineering works on robots too, apparently).

But here's the kicker that Sarah emphasized: her team at Microsoft tests from DAY ONE of development, not as some afterthought when the CFO is breathing down their neck about launch dates. 

They co-develop the testing alongside the actual system, catching issues when they're tiny problems instead of company-ending disasters.

Try this: 

Before writing a single line of code for your next agent, channel your inner pessimist and write down 10 specific ways it could fail in YOUR environment. 

Not generic "it might hallucinate" fears, but concrete nightmares like "interprets 'review customer feedback' as 'respond to every negative review with a 50% discount code.'" 

Build one test for each failure mode and run these tests every single time you iterate—not at the end, not when you remember, but systematically every time like your job depends on it (because it might).

3 – Your workforce needs superpowers they don’t have yet 🦸

Remember when everyone freaked out about calculators replacing math teachers?

The intersection of agentic AI and Responsible AI is wayyyyyy different. 

Sarah explained how humans are moving from the "inner loop" to the "outer loop" of oversight, and if that sounds like corporate jargon, let’s break it down into human speech.

Inner loop means you're the helicopter parent of a single AI chatbot—checking every output, approving every decision, basically babysitting a very smart toddler. 

Outer loop means agents might work autonomously for hours while you monitor patterns and aggregates, only stepping in when something looks systemically wrong.

It's like going from being a chatbot micromanager to being an agentic CEO overnight.

Sarah mentioned how even within her own team at Microsoft, they run learning sessions where people share what agents they built, what worked brilliantly, and what failed so spectacularly it became an office legend. 

They're doing this because nobody—and y’all we mean NOBODY, not even Microsoft—actually knows the best patterns yet for human-agent collaboration.

It’s fresh. 

The wildest part?

Microsoft Research just launched Magentic UI, which is literally an experimental playground for testing different ways humans and agents can work together, because the interface patterns we need don't exist yet. They're crowdsourcing innovation because this problem is so new that even the tech giants are making it up as they go.

Try this: 

Sarah's team does something genius that you can steal immediately. 

Start "Failure Fridays" where everyone shares one agent experiment—not boring status updates, but real stories like "I built an agent that saved 10 hours weekly but then it started ordering office supplies every time someone mentioned being 'out of ideas.'" 

Document every pattern religiously, because within a month you'll have a playbook of what actually works in YOUR specific environment, not some generic best practices document that treats every company like they're identical twins.

Reply

or to participate.