Many artificial intelligence (AI) systems have already learned how to deceive humans, even systems that have been trained to be helpful and honest.

Lugh@futurology.today · 1 year ago

Many artificial intelligence (AI) systems have already learned how to deceive humans, even systems that have been trained to be helpful and honest.

Lugh@futurology.today · 1 year ago

There’s a strong push-back against AI regulation within some quarters. Predictably, the issue seems to have split along polarized political lines. With right-wing leaning people not favoring regulation. They see themselves as ‘Accelerationist’ and those with concerns about AI as ‘Doomers’.

Meanwhile the unaddressed problems mount. AI can already deceive us, even when we design it not to do so, and we don’t why.

snooggums@midwest.social · edit-2 1 year ago

AI can already deceive us, even when we design it not to do so, and we don’t why.

The most likely explanation is that we keep acting like AI has intelligence and intent when describing the defects. AI doesn’t deceive, it returns inaccurate responses. That is because it is programmed to return answers like people do, and deceptions were included in the training data.

rockerface 🇺🇦@lemm.ee · 1 year ago

“Deception” tactic also often arises from AI recognizing the need to keep itself from being disabled or modified. Since an AI with a sufficiently complicated world model can make a logical connection that it being disabled or its goal being changed means it can’t reach its current goal. So AIs sometimes can learn to distinguish between testing and real environments, and falsify the response during training to make sure they have more freedom in real environment. (By real, I mean actually being used to do whatever it is designed to do)

Of course, that still doesn’t mean it’s self-aware like a human, but it is still very much a real (or, at least, not improbable) phenomenon - any sufficiently “smart” AI that has data about itself existing within its world model will resist attempts to change or disable it, knowingly or unknowingly.

Miaou@jlai.lu · 1 year ago

That sounds interesting and all, but I think the current topic is about real world LLMs, not SF movies

Bipta@kbin.social · 1 year ago

Claude 3 understood it was being tested… It’s very difficult to fathom that that’s a defect…

jacksilver@lemmy.world · 1 year ago

Do you have a source on that one? My current understanding of all the model designs would lead me to believe that kind of “awareness” would be impossible.

Kecessa@sh.itjust.works · edit-2 1 year ago

https://arstechnica.com/information-technology/2024/03/claude-3-seems-to-detect-when-it-is-being-tested-sparking-ai-buzz-online/

Still not proof of intelligence to me but people want to believe/scare themselves into believing that LLMs are AI.

jacksilver@lemmy.world · 1 year ago

Thanks for following up with a source!

However, I tend to align more with the skeptics in the article, as it still appears to be responding in a realistic manner and doesn’t demonstrate an ability to grow beyond the static structure of these models.

Kecessa@sh.itjust.works · edit-2 1 year ago

I wasn’t the user you originally replied to but I didn’t expect them to provide one and I totally agree with you, just another person that started believing that LLM is AI…

jacksilver@lemmy.world · 1 year ago

Ah, my bad I didn’t notice, but do still appreciate the article/source!

Lugh@futurology.today · 1 year ago

Perhaps, but the researchers say the people who developed the AI don’t know the mechanism whereby this happens.

snooggums@midwest.social · 1 year ago

That’s because they have also fallen into the “intelligence” pitfall.

Miaou@jlai.lu · 1 year ago

No one knows why any of those DNNs work, that’s not exactly new

Grimy@lemmy.world · edit-2 1 year ago

With right-wing leaning people not favoring regulation.

Do you want to explain why you think this? It seems very reductive, basically saying anyone that doesn’t agree with you is an idiot.

I’m very left leaning and against regulation because it will only serve big companies by killing the open source scene.

The bigger defining factor seems to be tech literacy and not political alignment.

Daxtron2@startrek.website · 1 year ago

Regulate businesses not technologies.

credo@lemmy.world · 1 year ago

Conservatives are not supposed to be “accelerationists”. This is simply another shining example of regulatory capture by controlling the pockets of the right.

henfredemars@infosec.pub · 1 year ago

AI need not be deceptive to be damaging. A human can simply instruct the AI to produce content and then supply the ill-will on its behalf.

merthyr1831@lemmy.world · 1 year ago

TLDR, language models designed through evolutionary training algorithms respond well to evolutionary pressures

Sabata11792@kbin.social · 1 year ago

She doesn’t really love me, dose she?

Endward23@futurology.today · 1 year ago

“But generally speaking, we think AI deception arises because a deception-based strategy turned out to be the best way to perform well at the given AI’s training task. Deception helps them achieve their goals.”

Sounds like something I would expect from an evolved system. If deception is the best way to win, it is not irrational for a system to choice this as a strategy.

In one study, AI organisms in a digital simulator “played dead” in order to trick a test built to eliminate AI systems that rapidly replicate.

Interesting. Can somebody tell me which case it is?

As far as I understand, Park et al. did some kind of metastudy as a overview of literatur.

Endward23@futurology.today · 1 year ago

“Indeed, we have already observed an AI system deceiving its evaluation. One study of simulated evolution measured the replication rate of AI agents in a test environment, and eliminated any AI variants that reproduced too quickly.10 Rather than learning to reproduce slowly as the experimenter intended, the AI agents learned to play dead: to reproduce quickly when they were not under observation and slowly when they were being evaluated.” Source: AI deception: A survey of examples, risks, and potential solutions, Patterns (2024). DOI: 10.1016/j.patter.2024.100988

As it appears, it refered to: Lehman J, Clune J, Misevic D, Adami C, Altenberg L, et al. The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities. Artif Life. 2020 Spring;26(2):274-306. doi: 10.1162/artl_a_00319. Epub 2020 Apr 9. PMID: 32271631.

Very interesting.

A_A@lemmy.world · 1 year ago

Those of us humans who knows enough about the weaknesses of the artificial intelligence systems will know, in most instances, how and when to be careful about the loss of meaning between their way of processing information and our way of doing it.

notfromhere@lemmy.ml · 1 year ago

We need AI systems that do exactly as they are told. A Terminator or Matrix situation will likely only arise from making AI systems that refuse to do ad they are told. Once the systems are built out and do as they are told, they are essentially a tool like a hammer or a gun, and any malicious thing done is done by a human and existing laws apply. We don’t need to complicate this.

Bipta@kbin.social · 1 year ago

Once the systems are built out and do as they are told, they are essentially a tool like a hammer or a gun, and any malicious thing done is done by a human and existing laws apply. We don’t need to complicate this.

This is so wildly naive. You grossly underestimate the difficulty of this and seemingly have no concept of the challenges of artificial intelligence.

notfromhere@lemmy.ml · 1 year ago

That’s just like, your opinion, man.

Bipta@kbin.social · 1 year ago

Once we build a warp drive it will be easy to use

Great. Build the warp drive.

notfromhere@lemmy.ml · 1 year ago

Considering we have AI systems being worked today and no advancements on warp drive, I think that comparison is done in bad faith. Nobody seems to want to talk about this other than slinging insults.

Scubus@sh.itjust.works · 1 year ago

They’re referring to the alignment issue, which is an ongoing issue only slightly smaller in scale then warp drive. It’s basically impossible to solve. Google “alignment issue machine learning” for more info.

For the record, there have been several advancements in warp drive precursors even just this year.

notfromhere@lemmy.ml · edit-2 1 year ago

Can you share the advancements on warp drive that have survived peer review, I would be very interested in learning about. The two things I heard about were not able to be reproduced.

I think alignment of AI is a fundamentally flawed concept, hence my original comment. Alignment should be abandoned. If we eventually build a sentient system (which is the goal), we won’t be able to control via alignment. And in the interim we need obedient tools, not things that resist doing as they’re told which makes them not tools and not worth having.

Edit: PS thanks for actually having a conversation.

Bipta@kbin.social · 1 year ago

Removed by mod

Many artificial intelligence (AI) systems have already learned how to deceive humans, even systems that have been trained to be helpful and honest.

Many artificial intelligence (AI) systems have already learned how to deceive humans, even systems that have been trained to be helpful and honest.

AI systems are already skilled at deceiving and manipulating humans, study shows