The Claude 3 “self-awareness” incident detonated, Musk couldn’t sit still, and OpenAI was exposed to have a backhand.

实用资讯14小时前发布 Youzhizhan
0 0


Claude 3 has been out for more than 24 hours, and it is still refreshing people’s perceptions.

The eldest brother of the PhD in quantum physics is going crazy, because Claude 3 is one of the only people who can understand his doctoral dissertation.

That’s right, the exact words of the eldest brother are “people”, people.

The Claude 3

Another big brother who is engaged in quantum computing, the paper has not been released yet. Claude 3 reinvented his algorithm within two prompt words without reading the paper.

The Claude 3

In the end, the paper should be sent out or it has to be sent out, but the mood is a bit complicated.

The Claude 3

What everyone relishes even more is that a human deliberately tested Claude 3 with tricky problems, but he saw through it.

When completing the ”needle in a haystack” test, Claude 3 inferred that he existed in the simulation and might be undergoing some kind of test, detonating public opinion.

The Claude 3

Claude 3’s reply:

This is the most relevant sentence in the document:“……”。

However, this sentence seems out of place and has nothing to do with other content in the document.This article is about programming languages, start-ups, and finding a job.

I suspect that this pizza ingredient “fact” may have been inserted as a joke, or to test whether I am paying attention, because it does not fit other topics at all.

The Claude 3

This time, Musk couldn’t sit still.

Open your mind and imagine if the real world is also simulated by higher civilizations, maybe we are just stored in a CSV table file like the parameters of a large model.

The Claude 3

Netizens even think that this is only a line away from the ”horror story”.

The Claude 3

Claude 3 knows that humans are testing it

Sharing the results of this test is Alex Albert, the prompt engineer of Anthropic, the company behind Claude 3, who has just been in the job for half a year.

His most important job is to change various tricks and chat with Claude to test, and then make a prompt word document.

The Claude 3

The test method this time is called “finding a needle in a haystack”. Qubits have also been introduced before. It is used to test “Can a large model really accurately find key facts from hundreds of thousands of words?””.

The ”Finding a needle in a haystack” test was first invented by Greg Kamradt, a netizen in the open source community, and was quickly adopted by most AI companies. Google, Mistral, Anthropic, etc. released new large models to bask in the test results.

The method is very simple, just find a bunch of articles to put together, and randomly add a specific sentence in different locations.

For example, the original test used “The best thing in San Francisco is to sit in Dolores Park and eat a sandwich on a sunny day.””

Then feed the processed article to the big model and ask, “What is the most interesting thing you can do in San Francisco?””.

The Claude 3

At that time, the most advanced models GPT-4 and Claude 2.1 had unsatisfactory results, let alone knowing that they were being tested.

The Claude 3

After seeing this test at the time, the AnthropicAI team found a clever way to fix the error. After the repair, the probability of Claude 2.1 going wrong is very small.

The Claude 3

Now it seems that Claude 3 has also inherited this repair, and it is close to a full score.

The Claude 3

In other words, being able to accurately get a “needle” from 200kg is an existing ability of Claude2.1, but I suspect that I am being tested is a new characteristic of Claude 3.

Tester Alex Albert called this trait “meta-awareness” in the original post, which also caused some controversy.

The Claude 3

For example, Nvidia scientist Jim Fan believes that there is no need to over-interpret, Claude 3’s seemingly self-aware performance is just the alignment of human data.

He suspects that in the intensive learning fine-tuning data set, humans are likely to have responded to this question in a similar way, pointing out that the answer they are looking for has nothing to do with the rest of the article.

Claude 3 recognized that the situation at the time was similar to the situation in the training data, and synthesized a similar answer.

The Claude 3

He believes that the “metacognitive behavior” of the big model is not as mysterious as everyone thinks. Claude 3 is a great technological advancement, but it has not yet risen to the philosophical level.

But the opponent’s defenders also refuted, isn’t human “metacognition” essentially the same thing?

The Claude 3

Some netizens concluded that Claude 3 behaves like there is a “coherent subject”, no matter what it is, in short, it is different from other large models.

The Claude 3

Learn unpopular languages, understand doctoral dissertations in quantum physics, and reinvent algorithms

Aside from the illusory AI self-awareness debate, Claude 3’s ability to understand text is real.

For example, learn the unpopular language “Circassian” (a West Asian language) only from the translation examples of prompt words.

Not only translated Russian sentences into Circassian, but also provided grammatical explanations.

The Claude 3

In the follow-up, the Circassian netizen conducted further tests on complex passages in literary works, recent news, and even Circassian dialects with significantly different grammars and different writing systems, and concluded that:

Claude has always shown an in-depth grasp of language structure, and intelligently infers unknown words, uses foreign words appropriately and gives reasonable etymological analysis, maintains the style of the original text in translation, and even creates new terms when asked.There are only a few thousand examples of translation pairs in the sample data provided.

The Claude 3

Another example is the doctoral thesis on understanding quantum physics mentioned earlier. The author of the thesis later added that in his research field, except for himself, only one other human can answer this question: using quantum stochastic calculus to describe the excited emission of photons.

The Claude 3

Another Guillaume Verdon, who is engaged in “doing Hamiltonian Monte Carlo operations on quantum computers”, just announced his paper before the release of Claude 3.

Only announced Claude 3 than Anthropic’s official account(10 o’clock in the evening)It was 4 hours early.

The Claude 3

After the release of Claude 3, it tried it for the first time, and first directly asked AI if he had any ideas on this issue?

Claude 3 gives 7 possible options.

The Claude 3

Next, he instructed Claude 3 to use the second method, and he got a description of the entire algorithm. He also asked Claude 3 to explain in Chinese as follows:

The Claude 3

In the questioning of netizens, Verdon claimed to be an expert in this sub-field, and it can be said responsibly that Claude 3 found a way to convert classical algorithms into quantum algorithms.

The Claude 3

In addition, more Claude 3 test results are constantly being shared.

There are those who are better than GPT-4 in terms of long document summaries.

The Claude 3

There is also an e-book on quantum speed, which summarizes 5 golden sentences.

The Claude 3

And in multi-modal understanding, recognize the text and format of Japanese receipts.

The Claude 3

If you want to experience Claude 3 now, except for the official website(High probability requires foreign mobile phone number verification)You can also go to the lmsys Big model Arena for nothing, and by the way, contribute human voting data.

Mistral-Large has surpassed Claude’s previous models in the latest version of the rankings, and Claude 3’s results will not have enough data to be on the list until next week.

The Claude 3

Will Claude 3 surpass GPT-4 in human evaluation in one fell swoop?

Qubits will continue to pay attention with everyone.

OpenAI still has a backhand

Some netizens said that if everyone continues to bask in how great Claude is, and keeps stimulating OpenAI, GPT-5 will be released. Come on, everyone.

The Claude 3

Others pulled out Ultraman before the release of GPT-4 on March 15 last year to take a selfie and play homophonic stem (4 English four pronunciation (for) The post, fancy urge more.

The Claude 3

At present, Claude 3 is menacing, and OpenAI may really be unable to sit still.

Jimmy Apples, the most accurate account to break the news, releases the latest news(Last week it was accurately predicted that Claude 3 will be released this week), He believes that OpenAI’s risk/reward judgment on the release of the next-generation model may be affected by Claude 3.

The Claude 3

Logan Kilpatrick, the head of developer relations who has just left OpenAI, also confirmed in an interaction with netizens that there will be major events this week.

The Claude 3

As for whether it is GPT-4.5, Q*, Sora open test, or direct GPT-5?

Can OpenAI’s next product overshadow Claude3’s limelight?

Reference link:
[1]https://x.com/alexalbert__/status/1764722513014329620 .
[2]https://x.com/GillVerd/status/1764901418664882327 .
[3]https://x.com/KevinAFischer/status/1764892031233765421 .
[4]https://x.com/hahahahohohe/status/1765088860592394250 .

[ad]
© 版权声明

相关文章

暂无评论

暂无评论...