12GB of VRAM is still an upgrade away for most people and a 4bit quantized 13B model is barely going to be a tech demo. When open source ai is proclaimed to be near/on par/better then gpt4 they are talking about nothing else than their biggest models in a prime environment.
I usually don’t think much about companies and cloud instances when it comes to
Fossai but fair enough.
For me its all about locally run consumer models. If we cannot archive that it means we will always need to rely on the wims and decisions of others to acces the most transforming technology ever invented.
It’s not even close, less than half of 3.5’s 85.5% in ARC. Some larger Open models are competitive in Hellaswag, TruthfulQA and MMLU but ARC is still a major struggle for small models.
3Bs are kind of pointless right now because the machines with processors capable of running them at a usable speed probably have enough memory to run a 7B anyway.
The problem is most of these models need like a terabyte of VRAM… And consumers have about 8-24GB.
Old news pal! 😄
12GB of VRAM is still an upgrade away for most people and a 4bit quantized 13B model is barely going to be a tech demo. When open source ai is proclaimed to be near/on par/better then gpt4 they are talking about nothing else than their biggest models in a prime environment.
Sure, but not for standard cloud instances that are very affordable for companies wanting to get away from OpenAI.
I usually don’t think much about companies and cloud instances when it comes to Fossai but fair enough.
For me its all about locally run consumer models. If we cannot archive that it means we will always need to rely on the wims and decisions of others to acces the most transforming technology ever invented.
Holy shit a terabyte?
This specific one says it’ll run on 24GB actually. But some are just crazy big.
There are smaller models that can run on most laptops.
https://www.maginative.com/article/stability-ai-releases-stable-lm-3b-a-small-high-performance-language-model-for-smart-devices/
In benchmarks this looks like it is not far off Chat-GPT 3.5.
It’s not even close, less than half of 3.5’s 85.5% in ARC. Some larger Open models are competitive in Hellaswag, TruthfulQA and MMLU but ARC is still a major struggle for small models.
3Bs are kind of pointless right now because the machines with processors capable of running them at a usable speed probably have enough memory to run a 7B anyway.