Huh, thanks to this kick-ass comment, I’m now running some different llama models locally on my machine.
For those wondering, all of them that I’ve tried (llama2, llama2-uncensored, and mistral) all respond really quickly and the text comes faster than I can read. Quicker wouldn’t seem to be of any use for me so I’m happy.
Specs:
AMD Ryzen 5 3600
16GB DDR4
GeForce GTX 1060 6GB
SSD
Works fine on Windows though WSL 2 on Ubuntu 22.04.