Researchers train AI chatbots to 'jailbreak' rival chatbots - and automate the process

Image

Image alternative text

Rivalarrival, 5 months ago

Therefore every AI chatbot maker needs to apply protections,

I’m pretty sure the instructions to create an AI chatbot have been published, and are available for a sufficiently capable AI to draw from. What keeps a primary, morality-encumbered AI from using those instructions to create a secondary, morality-unencumbered AI?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rufus, 5 months ago (edited 5 months ago)

Yeah, I don’t want to be negative, but half the article is a bit stupid. I hope they don’t do that. I tried writing a murder mystery story and ChatGPT would lecture me how killing people was immoral instead of helping. It’s ridiculous and I’m sure there are lots of other analogies. It’s neither possible to achive it 100% nor is it useful.

Thinking it through properly: AI is a tool. It would be like re-designing a knife so nobody can be stabbed anymore. It’d end up you not being able to cut pineapples or melons any more.

And I could still do harm to people with other tools than a knife. Or in this example: I can give harmful advice or write a pornographic story myself. What’s the benefit of any chatbot maker having to implement protections? Who decides on what moral is the correct one?

I think the correct approach is to study AI safety and expose ethics and make it controllable. Make users able to constrain/restrict or guide output to align with their use-case. I mean a company that replaces their helpdesk with AI would be interested the chatbot doesn’t tell their clients lewd stories. But it could be a valid use-case for other people. And giving advice or helping with scenarios or computer code also involves talking about issues and potential risks. You can’t entirely switch that off without ‘lobotomizing’ the AI and making it unusable except for casual talk.

And the article is a bit inconsistent. First they say researchers found an attack that can be used even if patched by developers. And then they offer the solution to patch it…?! Which one is it, then?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

vivi, 5 months ago

I wonder if there’s also a constraint not to make a sub-AI in many of the starting prompts

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Sina, 5 months ago

I would wager that copying itself would take priority over making company, but of course it would mostly be hardware limitations. (AI does not have a robot workforce to ensure whatever system the new copy is residing / new AI is training on is not shut off within a couple of minutes of the abnormalities being noticed)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Rivalarrival, 5 months ago

Priority is determined by the entity using the AI, not the AI itself. My point is that so long as the ability to create any AI is documented, an unencumbered AI is feasible.

We are on the verge of discovering Roko’s Basilisk.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

blindsight, 5 months ago

Aren’t there also a lot of open-source LLMs that aren’t “morally constrained”? There’s no putting the genie back in the lamp.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rufus, 5 months ago (edited 5 months ago)

Link to the paper

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

malpaso, 5 months ago

@rufus @throws_lemy https://arxiv.org/abs/2307.08715

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment