[Claude 3.5 Sonnet] Could it be as good as or better than GPT4o!?

Introduction.
Did it outperform GPT4o in benchmark comparisons?
The programming policy was dictated by voice input.

Introduction.

The updates of ChatGPT-related competitors are also tremendous, and the enhancements between ChatGPT, Claude, and Gemini seem to be heating up more and more.

Anthropic, one of the rivals of ChatGPT, released Claude 3.5 Sonnet yesterday. By the way, I have used GPT and Gemini before, but this is my first time using Claude.

Did it outperform GPT4o in benchmark comparisons?

The URL below is the release article published by Claude. Click on “Try on Claude.ai” in the article to go to the chat screen with Claude immediately after creating an account.

Introducing Claude 3.5 Sonnet

Introducing Claude 3.5 Sonnet—our most intelligent model yet. Sonnet now outperforms competitor models and Claude 3 Opus...

Incidentally, in comparison to the competition (GPT-4o, Gemini1.5, Llama-400b) according to our own research, it outperforms GPT4o in most benchmark scores. We’ll have to take a closer look at the assumptions on which the benchmarks are based (don’t take our word for it).

It is twice as fast as the Claude 3 Opus and one-fifth the cost of the previous version.
3 per million input tokens and $15 per million output tokens, with a token context window of 200,000 tokens.

I had recognized Claude as a service but had not used it before, so I gave it a try.

The programming policy was dictated by voice input.

In my case, I often use GPT exclusively for programming.

I will try to give prompt instructions while using the voice input described in the last video here.

The “weather report + advice” function, which will be implemented in the “Mia,” a small talking robot currently under development, will be played back by voice at a time set by the user in the application.

The app’s screen already displays information and advice about the day’s weather in text, and the app provides push notifications, but we assumed that having voice playback would “make it harder to forget your folded umbrella in the rain when you go to work, for example.

The result is this.
There are a few typos in the transcription (e.g. ESP32 -> ESP32), but Claude understands the context and absorbs them, so I did not bother to correct the text after voice input.

The response time to enter responses was very fast, and the content was as good as GPT4o.

Basically, when I do programming, I first have AI describe a rough policy of functions to be implemented in a bulleted list, and then I ask the programmer to paste the code that has already been implemented and may be applicable to the policy, and to write specific code to add the new function to the existing action. The instructions are given in two steps.

There seemed to be no problem here either.