aiccount,
@aiccount@monyet.cc avatar

Yeah, it’s trajectory thing. Most people see the one-shot responses of something like the chatgpt’s current web interface on openai’s website and they think that’s where we are at. It isn’t though, the cutting edge of just what is currently openly available to people is things like CrewAI or Autogen using agents powered by things like Claude Opus or Llama 3, and maybe the latest gpt4 update.

When you use agents you don’t have to baby every response, the agents can run code, test code, check latest information on the internet, and more. This way you can give a complex instruction, let it run and come back to a finished product.

I say it is a trajectory thing because when you compare what was cutting-edge just 1 year ago, basically one-shot gpt3.5 to an agent network with today’s latest models, the difference is stark, and when you go a couple years before that to gpt2, it is way beyond stark. When you go a step further and realise that there is lots of custom hardware being built(basically llm ASICs-traditionally a ~10,000x speedup over general use gpus), you can see that soon having instant agent based responses will be the norm.

All this compounds when you consider that we have not hit a plateau and that we are still seeing that better datasets, and more compute, are still producing better models. Not to mention that other architectures, like state-based Mamba, are making remarkable achievements with very little compute so far. We have no idea how powerful thinks like Mamba would be if they were given the datasets and training that the current popular models are being given.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • technology@beehaw.org
  • fightinggames
  • All magazines