- Everyday AI
- Posts
- EP 618: RSL vs. the AI Scrape: Can LLM licensing save the open web?
EP 618: RSL vs. the AI Scrape: Can LLM licensing save the open web?
RSL fights AI scraping, OpenAIās $6.5B and $100M deals, Microsoft Copilot gets official with Claude, Elon goes after Microsoft with AI and more.
š Subscribe Here | š£ Hire Us To Speak | š¤ Partner with Us | š¤ Grow with GenAI
Sup yāall! š
Todayās podcast episode was an important one for the future of LLMs.
We talked with Doug Leeds, an industry leader in the media and publishing industry. I think heās got a good solution on how to protect the open internet in the age of LLM companies gobbling up content freely, and itās worth a listen for all of us.
āļø
Jordan
Outsmart The Future
Today in Everyday AI
7 minute read
š Daily Podcast Episode: AI companies scrape the web to train their models, with or without permission. Our guest today tells us how that could change. Give it a watch/read/listen.
šµļøāāļø Fresh Finds: AI (kinda) brings Trump and Elon are back together, Microsoft brings agent reselling to the masses, Spotify axes AI music tracks and more. Read on for Fresh Finds.
š Byte Sized Daily AI News: OpenAIās $6.5B and $100M deals, Microsoft Copilot gets official with Claude, Elon goes after Microsoft with AI and more. Read on for Byte Sized News.
š§ AI News That Matters: Will AI search eat the web? And will your company be hurt? What can be done. We break it down. Keep reading for that!
ā©ļø Donāt miss out: Did you miss our last newsletter? We talked about Gemini in Chrome, OpenAI announces five new U.S. datacenters, Alibabaās $50 billion-plus investment in AI and more! Check it here!
RSL vs. the AI Scrape: Can LLM licensing save the open web? š„
AI scraping vs. the open web. Who wins? š„
Let's say the quiet part out loud: AI companies have trained their models for years on your company's website data, regardless of if you want them to.
Fast forward to today: many publishers have lost up to 70% of website traffic (and huge chunks of revenue) because of this.
So what happens if some of these publications.... die? Then the AI companies have fewer, human-created source material to train future models on.
Rut roh.
Maybe the Really Simple Licensing protocol can save us all.
We talk with RSL Collective Co-founder Doug Leeds on if a new licensing structure can be a win-win-win for publishers, AI companies and end users.
You don't wanna miss this one. š
Also on the pod today:
⢠Open web under AI threat? š
⢠Blocking AI crawlers: does it work? š«
⢠Collective rights for content owners šļø
Itāll be worth your 25 minutes:
Listen on our site:
Subscribe and listen on your favorite podcast platform
Listen on:
Hereās our favorite AI finds from across the web:
New AI Tool Spotlight ā Doraverse is an all-in-one AI coworker, Oreate is an AI assistant that works for you, Ambient is an AI-powered daily briefing app.
AI Developments ā Could this new reasoning engine save the AI cost and compute crisis?
AI in Entertainment ā Spotify cracked down on spammy AI music, cutting 75 million tracks. Hereās why the did it.
AI and Politics ā Trump and Elon Musk are getting back together: the government just inked a deal to deploy xAIās Grok across federal agencies through 2027.
Agents ā Moonshot released OK Computer, an agent mode for their popular Kimi model.
LLM Updates ā Claude users can FINALLY move chats into projects.
Move chats into new or current projects. One at a time, or all together.
ā Claude (@claudeai)
8:15 PM ⢠Sep 24, 2025
AI Talent ā Meta strikes again, grabbing an OpenAI scientist for their AI lab lead.
AI in the Media ā A Chinese distributor allegedly used AI to turn a same-sex wedding into a straight one in the film āTogether,ā prompting backlash and a paused release.
Agentic AI ā Microsoftās new unified Marketplace lists 3,000+ AI apps and agents, and lets partners sell private offers. Bottom line: easier AI shopping and faster rollout. See how it works.
1. Report: OpenAI testing personalized daily digest šļø
According to newly surfaced app strings, ChatGPT appears to be preparing a once-a-day personalized daily digest update that you can opt into, promising a āfresh new update every morningā and the ability to ācreate for tomorrow.ā
The hints suggest you can give targeted feedback, influence what shows up next, and review your past feedback, which points to a more personalized daily insight feed aimed at helping people prioritize work and decisions. This lines up with earlier clues about a āgolden hourā web announcement and may connect to memory features or the recently spotted alpha agent models.
2. Claude lands in Microsoft 365 Copilot š¤
Anthropicās Claude Sonnet 4 and Opus 4.1 are now available in Microsoft 365 Copilot, starting via the Frontier Program for customers who opt in.
Claude Opus 4.1 can power the Researcher agent for complex, multi-step work across web, third-party data, and your companyās emails, chats, meetings, and files, while Copilot Studio adds model choice for building and orchestrating enterprise-grade agents. Admins can enable access in the Microsoft 365 admin center, with Anthropic models hosted outside Microsoft-managed environments under Anthropicās terms, which signals a pragmatic, multi-model strategy from Microsoft.
3. Muskās xAI starts hiring for AI-powered Microsoft competitor, āMacrohardā š„
Elon Muskās xAI is now actively hiring engineers to build Macrohard, a āpurely AI software company,ā with cofounder Yuhuai Wu recruiting for a new team focused on computer control agents tied to Grok5 later this year. The tongue-in-cheek name nods to Microsoft, but Muskās claim that software companies can be simulated entirely with AI underscores a push to automate core dev and operations.
The immediate hiring move signals xAI is shifting from concept to execution, prioritizing applied agent work that could reshape how software is produced and maintained.
4. OpenAI taps CoreWeave for $6.5B more compute, deepening a fast-moving partnership ā”
OpenAI just expanded its agreement with CoreWeave by $6.5 billion, bringing their joint commitments to $22.4 billion, a timely escalation as demand for GPU-heavy training and inference surges, according to Associated Press.
The deal stacks on OpenAIās Tuesday reveal of its Stargate buildout with Oracle and a new arrangement with Nvidia to reach at least 10 gigawatts of AI data centers, signaling an aggressive race to lock down capacity before year-end. Hedgeye notes the partnership could concentrate CoreWeaveās business around OpenAI and Nvidia by 2027, underscoring how pivotal this trio has become in the AI compute supply chain.
5. 20,000 enterprises to tap GPTā5 inside Databricks š§±
According to Databricks, a new $100 million, multi-year partnership with OpenAI will make GPTā5 and other frontier models natively available inside the Databricks Data Intelligence Platform and Agent Bricks for more than 20,000 customers.
The companies say this integration cuts out data duplication and tricky plumbing, enabling governed, production-grade AI agents with joint engineering focused on enterprise performance, observability, and compliance through Unity Catalog. Executives from both firms and early adopters like Mastercard frame the move as putting advanced AI directly where secure business data already lives, which accelerates evaluation and deployment.
š¦¾How You Can Leverage:
Doug Leeds helped build media and publishing empires for decades. šļø
From early internet darlings like Ask.com, Yahoo, Dictionary.com and Reddit, he helped bring information to the masses in print and online.
Hereās the gist: AI companies have essentially scraped the internet for decades to feed their LLMs, and now serve up info to users. The problem? Publishers arenāt getting clicks and some are legit on the brink of going out of business.
Doug has shifted his focus to protecting the open internet.
Enter: Really Simple Licensing, or RSL. Itās a simple protocol where publishers can set terms on how AI companies can (or canāt) scrape.
So can the open internet be saved?
Or will AI companies eventually eat the hand that feeds them and make LLMs dumber as a result?
Make sure to check todayās jam-packed episode, but letās dive in for the 1-2-3 of what you need to know.
1 ā Machine Readable Standards Beat Blocking š„
Google and other big search giants have forced every publisher into an impossible choice. Accept AI scraping or disappear from search entirely.
You can't have it both ways anymore.
Yeah, you can try and block certain AI crawlers in your websiteās robots.txt file, which communicates how bots can or canāt crawl your site. (And AI companies still sometimes pretty much ignore that.)
RSL is the better way.
Really Simple Licensing creates machine-readable standards that go directly in your robots.txt file. Instead of yes-no decisions, you specify exactly what AI companies can do with your content and how much it costs.
Doug explained how current blocking strategies backfire completely. Publishers lose search visibility while AI companies scrape content anyway.
The new standard works across agentic web systems, RAG protocols, and MCP frameworks. You set licensing terms per piece of content.
Educational use only? Commercial licensing at $0.001 per citation? Both work.
Try This
Audit your current robots.txt file and document what you're actually blocking versus allowing.
Check three of your biggest competitors' robots.txt files to see how they're handling AI crawlers differently than you.
Calculate the monthly traffic value of your most-cited content pieces. That's your baseline for understanding what licensing could be worth.
2 ā Collective Power Beats Individual Negotiations ā”
Individual publisher negotiations with AI companies are basically impossible.
Doug's building RSL Collective as the web content equivalent of ASCAP. One blanket license covers all member content, AI companies pay when they use specific pieces, payments flow back automatically.
The supporter list reads like publishing royalty. Reddit alone powers 40% of all large language model citations according to recent studies.
That's massive collective bargaining power.
Doug emphasized the legal enforcement aspect. Instead of fighting billion-dollar legal teams solo, you get the entire industry backing your copyright claims.
ASCAP has used this exact model in music for over a century. Joining The RSL Collective costs nothing, stays non-exclusive, you can opt out anytime.
Try This
Deciding your relationship with AI scrapers and AI search isnāt an easy choice. First, you need to find out if AI search is having a net-positive or net-negative impact on your bottom line.
Once youāve decided that, see if the RSL Collective might be the move for you.
3 ā Licensed Content Cuts AI Costs While Improving Quality š
Current AI development burns billions mashing up scraped content to avoid copyright claims.
Doug broke down the hidden economics here. AI companies spend massive compute resources disguising copied content, which creates hallucination-prone answers while delivering subpar results.
Legal risk stays sky-high anyway.
With proper licensing, AI companies can serve original expert content directly instead of synthetic mashups. A finance question gets the actual Investopedia article rather than a potentially hallucination-prone summary.
This approach cuts compute costs dramatically while improving answer quality. No more billion-dollar processing just to disguise content theft.
Sam Altman already admittedthe industry need a new protocol for sustainable content training. The $1.5 billion Anthropic settlement proves the current approach fails.
Doug compared it to the early music streaming transition. Napster was a better product than buying $20 CDs, but it wasn't sustainable without paying artists.
Apple went to ASCAP and created streaming licensing that actually worked for everyone.
Try This
Run your most valuable search terms or branded terms through ChatGPT, Gemini, Copilot and Claude to see how they currently reference or cite your material in their responses.
Calculate your annual content creation budget divided by total pieces published. That's your per-piece cost baseline for understanding licensing economics.
Monitor your referral traffic from AI tools over the next month via Google Analytics. Track whether citations actually drive meaningful traffic or just replace it entirely.
Reply