Let’s talk about our experiences working with different models, either known or lesser-known.
Which locally run language models have you tried out? Share your insights, challenges, or anything you found interesting during your encounters with those models.
With a quantized GGML version you can just run on it on CPU if you have 64GB RAM. It is fairly slow though, I get about 800ms/token on a 5900X. Basically you start it generating something and come back in 30minutes or so. Can’t really carry on a conversation.
Is it smart enough that it can get the thread of what you are looking for without as much rerolling or handholding, so this comes out better?
That’s the impression I got from playing with it. I don’t really use LLMs for anything practical, so I haven’t done anything too serious with it. Here’s are a couple examples of having it write fiction: https://gist.github.com/KerfuffleV2/4ead8be7204c4b0911c3f3183e8a320c
I also tried with plain old llama-65B: https://gist.github.com/KerfuffleV2/46689e097d8b8a6b3a5d6ffc39ce7acd
You can see it makes some weird mistakes (although the writing style itself is quite good).
If you want to give me a prompt, I can feed it to guanaco-65B and show you the result.