Categories
Artificial intelligence

I found a simple task where Generative AI fails!

I made ChatGPT-4o go into a loop while trying to come up with an answer to my question yesterday. Gemini Advanced failed too. It was surprising to me, to think that these very capable Generative AI applications can fail in a task that is relatively simple to us humans.

But today, it looks like ChatGPT learned from its mistakes 👏

If you also want to try this, here are the details 👇

Background (Baby names! 👶)

A friend wanted help with finding a good baby name. My search started simply in Google, as usual. Then, I came across an article about numerology and how each character in the alphabet is assigned a specific number, allowing us to calculate a total for a baby name that is compatible (the details are irrelevant for this blog post).

These were the numbers for characters:

Values for each character of the alphabet in Chaldean numerology

So, for example, if the name is “John”, it would be:

J = 1, O = 7, H = 5, N = 5.

So, 1 + 7 + 5 + 5 = 18

My idea 💡

I was curious to use AI, to come up with baby names whose total (according to the above chart) is a specific number, just to see how easy it is. After all, these Generative AI applications can code fully functional games before you can even finish your coffee.

If a human was given this task, we are going to:

  • Collect a few names we like
  • Calculate each name to see if they match the required total
  • If matches, add it to a ‘selected names’ list
  • If not, we would try to modify the name a bit to confirm to the total
  • If that fails, we will move to other names
  • Repeat until we have the required number of names

It’s as simple as that.

First step ✅

As a first step, I provided both ChatGPT and Gemini the characters & values in the chart above and explained the situation.

Then, to make sure they got it, I gave a name and asked for the total. Here’s the Gemini screenshot:

Gemini Advanced AI chat

As expected, that was easy-peasy. ChatGPT responded similarly too.

Step Two 👎

Then I pulled the UNO Reverse card on them and asked to come up with names to satisfy a particular total (Asked for 5 names from ChatGPT and 1 name from Gemini). They both started lying!

Here’s the Gemini screenshot:

Generative AI chat

When asked, they both responded with a “sorry”:

Gemini Advanced AI chat

I tried to make it “really” easy, expecting the word “Dog”, but it guessed “Kitty” (I love both, so that’s forgivable) but how can you forgive that addition?:

Gemini Advanced AI chat

They kept saying sorry…

Gemini Advanced AI chat

The scary ChatGPT loop ➿

Since I asked for only 1 name from Gemini, it just failed, said sorry and stopped – no hard feelings.

With ChatGPT, since I asked for 5 names, it was listing each name along with its characters and values. So I asked it to stop doing that and only list just the name and its total. Also, I said “strongly” that it should respond only when it has names with the correct totals, because it was getting annoying.

I guess I hurt ChatGPT’s feelings! It started going crazy 😕

ChatGPT started to list a name, its characters, values and the total – then it would realize that the total is not what I wanted, say sorry and then try again… and again… and again…

Here’s how the loop went:

ChatGPT-4o AI chat
ChatGPT-4o AI chat

When it was the 14th attempt or something, I got scared and stopped the generation!

Then, I tried one more time to clearly state what I want. The result? More lies:

ChatGPT-4o AI chat

I gave up and went to bed happily. Because, they are not going to take our developer jobs this way 😜

The comeback 👏

I wanted to recreate the situation in a new chat today, to capture a video of ChatGPT failing and going into a loop, for this blog post. But to my surprise, it looks like it has learned from its mistakes. This was the response today:

ChatGPT-4o AI chat

It says “out of this list” and “from the list” in this answer (I have not mentioned anything about a list). I guess it selected names into a list and did the calculation in-memory before responding today (like how we would do), as opposed to responding immediately before calculating the total properly yesterday.

Gemini still doesn’t care.

Have you experienced anything interesting like this with your Generative AI tools?

If you have any interesting experience with AI like this (or even a different result for the above experiment), feel free to share in the comments below – I’d love to hear about it.

Cheers 🍻