Yesterday Mistral AI released a new language model called Mistral 7B. @justnasty@lemmy.kya.moe already posted the Sliding attention part here in LocalLLaMA, yesterday. But I think the model and the company behind that are even more noteworthy and the release of the model is worth it’s own post.
Mistral 7B is not based on Llama. And they claim it outperforms Llama2 13B on all benchmarks (at it’s size of 7B). It has additional coding abilities and a 8k sequence length. And it’s released under the Apache 2.0 license. So truly an ‘open’ model, usable without restrictions. [Edit: Unfortunately I couldn’t find the dataset or a paper. They call it ‘open-weight’. So my conclusion regarding the open-ness might be a bit premature. We’ll see.]
(It uses Grouped-query attention and Sliding Window Attention.)
Also worth to note: Mistral AI (the company) is based in Paris. They are one of the few big european AI startups and collected $113 million funding in June.
- Details are on Mistral AI’s Announcement
- techcrunch news article including information about the company
- They released an base/foundation model and an instruction-tuned one on HuggingFace
- And llama.cpp is already compatible and GGUF versions out there.
I’ve tried it and it indeed looks promising. It certainly has features that distinguishes it from Llama. And I like the competition. Our world is currently completely dominated by Meta. And if it performs exceptionally well at its size, I hope people pick up on it and fine-tune it for all kinds of specific tasks. (The lack of a dataset and detail regarding the training could be a downside, though. These were not included in this initial release of the model.)
EDIT 2023-10-12: Paper released at: https://arxiv.org/abs/2310.06825 (But I’d say no new information in it, they mostly copied their announcement)
As of now, it is clear they don’t want to publish any details about the training.
Guess we’re going to see what happens. Judging by their careful wording “driving the AI revolution by developing OPEN-WEIGHT models that are on par with proprietary solutions” I’m afraid they did that on purpose to mislead people and really mean open-weight and not open-source. Seems that’s just the careless interpretation of the journalists/reporters and people like me who should learn not to mix facts and own conclusions. I’m going to follow the progress. Hope they will answer the questions.
Edit: And judging by what I read on their discord, opening their tuning process is not gonna happen. :-(
IMO the availability of the dataset is less important than the model, especially if the model is under a license that allows fairly unrestricted use.
Datasets aren’t useful to most people and carry more risk of a lawsuit or being ripped off by a competitor than the model. Publishing a dataset with copyrighted content is legally grey at best, while the verdict is still out regarding a model trained on that dataset and the model also carries with it some short-term plausible deniability.
Depends a bit on what we’re talking about. And ‘old’ concepts what ‘open-source’ means for software don’t apply 1:1 to ML models.
Sure, you’re right. Letting people use it without restrictions is great. But thinking about it, it smells more like a well-made marketing stunt. They’re giving away one free 7B model that’s making headlines to advertise for their capabilities. It’s a freebie. Really useful to us, no doubt. But it’s made to get us hooked and we’re probably not getting the following things without restrictions.
And that’s my main criticism. We, the people, get the breadcrumbs of a hundreds of millions of dollars minimum industry. We’re never going to emancipate ourselves, because they’re keeping the datasets to themselves and also the hardware is prohibitively expensive for everyone without commercial interest.
But you’re completely right. Even if they wanted to share the dataset with the world (which Mistral AI doesn’t) they couldn’t do it. Because currently there’s just no way to do it legally. (except for in Japan ;-)
I hope the whole picture isn’t as pessimistic as I’m painting it here. We’re probably getting more stuff and there is competition and other factors at play. Also I’m sure we’re eventually getting legislation that works better. But still. I’m always a bit uneasy when being at the mercy of generous multi million dollar companies.
And a few practical limitations. I don’t know how many trillion tokens they used to train it, which languages it speaks and we won’t be able to learn things from the training process for science and the next paper. We’re limited to benchmarks to learn things.
To be honest, the same could be said of LLaMa/Facebook (which doesn’t particularly claim to be “open”, but I don’t see many people criticising Facebook for doing a potential future marketing “bait and switch” with their LLMs).
They’re only giving these away for free because they aren’t commercially viable. If anyone actually develops a leading-edge LLM, I doubt they will be giving it away for free regardless of their prior “ethics”.
And the chance of a leading-edge LLM being developed by someone other than a company with prior plans to market it commercially is quite small, as they wouldn’t attract the same funding to cover the development costs.