TehPers,

You don’t need a LLM to see if the output was the exact, non-cyphered system prompt (you can do a simple text similarity check). For cyphers, you may be able to use the prompt/history embeddings to see how similar it is to a set of known kinds of attacks, but it probably won’t be even close to perfect.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • technology@beehaw.org
  • fightinggames
  • All magazines