Vicuna-33B-1-3-SuperHOT-8K-GPTQ(huggingface.co)

posted 1 year ago

notfromhere@lemmy.one

localllama@sh.itjust.works

9 commentshide report

Sort:

Hot Top Controversial New Old

You are viewing a single thread.

View all comments View context

[ - ]

simple@lemmy.mywire.xyz

1 point

1 year ago

Yeah llama.cpp with SuperHOT support would be great, and yeah I’m using exllama with oobabooga UI. I found out why I’m getting garbage output with 2k. It seems like SuperHOT 8K models, when run with 2k context, have a massive increase in perplexity.

(Higher perplexity, the worse the output quality).

So I’ll need to figure out if I can get at least 4K running without running out of VRAM.

Also, there is a new PR for exllama which uses a different method of getting higher context (not SuperHOT) and also has less perplexity loss. So that might be a better alternative potentially.

permalink

report

parent

[ - ]

notfromhere@lemmy.oneOP

1 point

1 year ago

I read the guy’s blog post on SuperHOT and it sounded like it didn’t increase perplexity and kept perplexity super low with large contexts. I could have read it wrong but I thought it wasn’t supposed to increase perplexity.

permalink

report

parent

[ - ]

simple@lemmy.mywire.xyz

2 points

1 year ago

The increase in perplexity is very small, but there is still some with 8K content. But it seems like with 2K its much larger. I could be misunderstanding something myself. But my little test with 2K context does suggest there’s something going on with 2K contexts on SuperHOT models

permalink

report

parent

LocalLLaMA

!localllama@sh.itjust.works

Create post

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

Community stats

73
Monthly active users
219
Posts
830
Comments

Community stats

Community moderators