• Everyday AI
  • Posts
  • Anthropic Claude 3.5 Sonnet – How it compares to ChatGPT's GPT-4o

Anthropic Claude 3.5 Sonnet – How it compares to ChatGPT's GPT-4o

Amazon coming after ChatGPT, why Meta and Apple partnership didn't happen, a Claude vs. ChatGPT showdown and much more.

Outsmart The Future

Today in Everyday AI
8 minute read

🎙 Daily Podcast Episode: Anthropic released Claude 3.5 Sonnet and it claims to have better benchmarks than any other LLM. Is this finally a ChatGPT killer? We put it to the test live and give you our hot takes. Give it a listen.

🕵️‍♂️ Fresh Finds: Meta expands its AI startup program, Gmail rolls out Gemini features and a real photo wins in an AI competition. Awkward. Read on for Fresh Finds.

🗞 Byte Sized Daily AI News: Amazon is working on an AI chatbot to compete with big tech, why the Apple x Meta partnership didn’t happen and IBM integrates AI into quantum computing. For that and more, read on for Byte Sized News.

🚀 AI In 5: We’re breaking down how to build a custom GPT in ChatGPT and why you might want to build one. See it here

🧠 Learn & Leveraging AI: Wondering how Anthropic Claude 3.5 Sonnet stacks up to ChatGPT? Does its benchmark claims hold up? We put it through the ringer so you don’t have to. Keep reading for that!

↩️ Don’t miss out: Did you miss our last newsletter? We talked about Apple x Meta AI, Gemini added to student school accounts and TikTok parent company seeking AI chips. Check it here!

Anthropic Claude 3.5 Sonnet – How it compares to ChatGPT's GPT-4o 🥊

Is there finally a (real) ChatGPT killer?

Anthropic just released its newest model, Claude 3.5 Sonnet. And out of the gate, this thing is friggin powerful.

It's outbenchmarking every other model and the new Artifacts feature is a legit game-changer.

But, does it actually hold its weight against ChatGPT's new GPT-4o head-to-head?

Join the conversation and ask Jordan questions on Claude 3.5 Sonnet here.

Also on the pod today:

• Claude 3.5 Sonnet Artifacts Feature 🛠️
• Real World Examples and Use Cases 🤔
• Comparison of Claude 3.5 Sonnet to GPT-4o 🥊

It’ll be worth your 1 hour and 7 minutes:

Listen on our site:

Click to listen

Subscribe and listen on your favorite podcast platform

Listen on:

Upcoming Everyday AI Livestreams

Wednesday, June 26th at 7:30 am CST ⬇️

Here’s our favorite AI finds from across the web:

New AI Tool Spotlight – Zebracat turns text into marketing videos, Relay.app gives you AI-powered automations and Scene is an All-in-one AI web design workspace.

Big Tech – Meta is expanding its AI startup program in Europe.

Google - Gmail is rolling out its Gemini AI features, including an AI sidebar and email summaries.

AI in Education – Google discusses how it created its Google AI Essentials course.

Read This – A photographer submitted his real photo into an AI contest and ended up winning.

1. Amazon Developing New AI Chatbot "Metis" to Rival ChatGPT ⚔️

Amazon is reportedly working on its own AI chatbot, codenamed "Metis." This new chatbot, set to compete with OpenAI's ChatGPT, will offer consumer-focused services accessible through web browsers. Analysts predict that Amazon's strategic move could generate an additional $600 million in sales from its existing Alexa user base, highlighting the company's commitment to AI innovation

Reports also suggest that "Metis" shares infrastructure with the upcoming "Remarkable Alexa," hinting at a comprehensive AI strategy within Amazon.

2. Apple x Meta AI Never Happened Due to Privacy Concerns 😬

Recent reports from Bloomberg suggest that Apple and Meta were in discussions regarding the integration of Meta's AI models into Apple devices. The talks, initiated back in March, involved Apple’s concerns about user privacy which led to a pause in the plans to integrate Meta's AI models into iPhones.

3. IBM Quantum System Two and AI Integration 🧑‍💻

IBM is revolutionizing the tech game by merging AI technology with its quantum computing platform, paving the way for groundbreaking advancements. Through their WatsonX platform and Qiskit software, IBM is set to enhance circuit optimization, resource management, and error correction in quantum computing.

The integration of AI is projected to lead to a 40% improvement in circuit size and a 2x to 5x enhancement in processing speed. This quantum leap in computing signifies a monumental step towards practical quantum applications in the near future.

4. Chinese AI Giants Fight Against OpenAI's API Restrictions 🇨🇳

In response to OpenAI's move to restrict API access in certain regions, Chinese tech leaders are making strategic moves. Baidu rolls out a generous "inclusive program" offering free migration to its Ernie platform and matching users' OpenAI usage with Ernie 3.5 tokens.

Alibaba Cloud joins the fray by providing free tokens and migration services through its AI platform, offering the cost-effective Qwen-plus model. Zhipu AI enters the ring with a "Special Migration Program" highlighting its GLM model as a competitor to OpenAI's ecosystem.

5. Central Banks Advised to Embrace AI 🏦

The Bank for International Settlements (BIS) emphasizes the importance of AI in enhancing inflation prediction abilities without fully replacing human judgment in setting interest rates. Cecilia Skingsley of BIS warns against letting AI become sole "robo-ratesetters" due to their untested nature and potential to "hallucinate".

Despite the benefits AI offers in real-time data monitoring, central banks are urged to maintain a balance between leveraging AI's power and preserving human accountability in crucial decision-making processes.

Custom GPTs in ChatGPT Explained!

Wondering what a custom GPT is and how it can revolutionize your workflow?

Whether you're a seasoned tech pro or a curious newbie, this guide is your first step towards optimizing your knowledge work using AI.

We break down the essentials of custom GPTs in ChatGPT and how you can tailor this powerful tool to fit your specific needs.

🦾How You Can Leverage:

*Leeeeeeeeeet’s get ready to LLM ruuuuuuuuuumble.

It’s been a while since we pitted top LLMs head to head in an Everyday AI episode. 

So, ya’ll told us you wanted a throw-down, so we threw it down live. 

// screenshot from Claude poll last week // 

With Anthropic’s new Claude 3.5 Sonnet on the scene, it seems we’ve got a new king of the hill.

According to Anthropic’s benchmarks, the new Claude 3.5 Sonnet takes the cake in just about every test against the other most capable frontier models. 

But how does it really stack up in real world testing? And should you start grabbing for the Sonnet shaker the next time you’re sitting at the LLM table? 

To get some answers, we pitted Claude 3.5 Sonnet and. GPT-4o against each other, head to head. 

Full disclosure, our real-time tests aren’t scientific or definitive. They’re just quick looks at how the top two models stack up. 

So, here’s what’s new in Claude 3.5 Sonnet, what we’re loving and hating, and how the two heavyweights stack up against each other. 

Ready? 

One, Two, Threeeeeeeeeeeeeeeeeee. 👇

1 – What’s new in Claude 3.5 Sonnet 🧑‍🎨

Anthropic also introduced the Artifacts feature, a new side-by-side interface that helps you not only create but render code and visualizations in real time. (For all other LLMs, you’ve gotta go copy and paste code into a third party software to do this.) 

The Claude 3.5 Sonnet model itself is faster, cheaper and more capable in just about every way. 

Also worth noting — it’s only been 3 months since Anthropic released Claude 3, introducing three different flavors: 

Haiku — The fastest but least capable model 

Sonnet — The middle model 

Opus — The largest and most capable model

The takeaway: 

Also, let’s cut it straight on the naming and (probable) strategy here. 

By upgrading their middle Sonnet tier and not the top Opus tier, Anthropic is clearly firing off warning shots, in the same way that Meta’s smaller 70B models grabbed headlines while it’s still sitting on its larger offering. 

In short, everyone’s saving something up their sleeve for OpenAI’s next big offering.

Presumably, Anthropic will have Claude 3.5 Opus ready to trump in whenever OpenAI releases its next big thing. 

In the race for enterprise customers and outside funding, model makers gotta be nimble in their offerings and working models ahead.

That seems to be Anthropic’s approach here. 

2 – What we love and what we hate 🎯

There’s a lot of each, TBH.

Again, we’ve only spent a few hours in Claude 3.5 Sonnet, but here’s our first impressions of the good and the bad.

What we loved:

If you’re not a GPT-4o power user, you were probably blown away by the speed of 3.5 Sonnet.

Even compared to Opus, it’s Usain Bolt fast. The wild part is Opus didn’t feel slow until OpenAI released GPT-4o in May.

Ever since, Anthropic’s offerings have felt sluggish.

Not anymore. 3.5 Sonnet is faaaaast.

Also, the new Artifacts feature is a gem, especially if you code.

A built-in interface that renders code, designs and more in real-time is a much more intuitive way to work with models.

The biggest disconnect in working with models is what happens AFTER a generation. Maybe it’s a lot of copying/pasting/modifying, or taking outputs into other programs and testing it.

With Artifacts, a lot of that is gone. Being able to render a front-end design, or execute code, in real time feels fresh and much-needed.

What we hate:

Sorry, we still can’t honestly recommend the front-end Anthropic Claude experience for most business use-cases.

Why?

Claude remains the only major LLM without some sort of real-time access to the internet.

We’re sure it’s easier said than done, but it’s a legit necessity in today’s LLM game. Meta, Google, Microsoft and OpenAI all have different real-time capabilities in their respective models.

And even though 3.5 Sonnet has an April 2024 knowledge cutoff date, a not-connected model is a dangerous game to play in terms of accuracy, hallucinations, etc.

And a little more of what we hate in the head-to-head below.

3 – The showdown: who won? 🧍

Let’s reiterate the obvious — our head to head was not a scientific test.

We ran a bunch of simple, zero-shot prompts and used our best judgement to gauge which model had better responses.

With proper prompt engineering, we would have gotten better results from each model, but that’s not the point here. 

Also, we only compared similar features and used tests that would draw on those similar features. 

  • So, we weren’t going to test the ability to render code, as Claude is the only one that can handle that.

  • Similarly, we weren’t going to test recent events and trends, as GPT-4o is the only one that can handle that.

Also, we started each chat off by encouraging each model to take it step-by-step and to think critically about each prompt before responding. 

With that in mind, let’s break down where each model shined or fell flat. 

Claude 3.5 Sonnet

Speed – While the output speed is lightyears ahead of Claude 3, it was pretty on par with GPT-4o. So we’d call that a push.

Copywriting –  TBH, the only area where Claude 3.5 Sonnet shined in our very limited matchup was in its ability to create written content. Claude’s copy sparkled, whereas GPT-4o’s out-of-the-box style sounded bland and uninspiring. 

Long docs – And for all the hubalub about Claude’s extended/better memory, it’s harder to reap those benefits when its file upload capabilities are soooooooo limited. 

ChatGPT 4-o 

Logic — In most of our logic and reasoning tests, GPT-4o had a VERY slight advantage. While neither shined, GPT-4o in our tests was consistently either slightly more right or slightly less wrong. Lolz. 

Data Analysis — Where we saw GPT-4o shine, though, was in its ability to handle our data analysis tests. Claude couldn’t even handle a large spreadsheet, let alone crunch the data.

In this head-to-head, Jordan’s jaw literally dropped when GPT-4o crunched through more than a half million cells of data and started visualizing the data accurately in near real-time. Sheesh. 

Copywriting — Out of the box, ChatGPT is a terrible copywriter. We saw this in our head-to-head. (With proper prompt engineering though, ChatGPT can crush this. In a simple zero-shot head to head, though, Claude crushed ChatGPT.) 

So, which model do you think is better? 

Which model Is better?

Login or Subscribe to participate in polls.

Numbers to watch

$120 Million

AI startup Etched raises $120 million to develop specialized chips.

Now This …

Let us know your thoughts!

Vote to see live results

What show type is your fave for the podcast? (vote to see results)

Login or Subscribe to participate in polls.

Reply

or to participate.