Training anything resembling a current LLM from scratch is so far beyond the abi...

neilv · on June 15, 2023

Is the earlier point that people should see what they can do with "common household ingredients", before they assume they need to pay cloud providers for bigger/more iron?

I agree, and I have a 3090 for that purpose, and once wrote a tutorial for others wanting to do ML stuff on a GPU at home rather than rent from a cloud provider.

But a consumer GPU (or eBay older Tesla card) can't do everything that a rental pool of H100 and A100 can do, and I and other readers here will sometimes want to do those other things.

I didn't want to confuse people that all they'd need was to buy up a bunch of random retired Ethereum miner GPUs, no matter what they wanted to do with ML.

kkielhofner · on June 15, 2023

Don't get me wrong - websites like this still have a lot of value. My overall point is the assumption that you need A100/H100 to make productive use of LLMs isn't accurate. You can go a very long way with a single 3090 in a workstation for $1000 (as frequently noted on HN, including this thread, r/localLLamA, etc). Or you can rent nearly anything you want on various platforms (most of the consumer RTX stuff is usually only available on platforms like Vast.ai). Whatever works for you in your situation but the somewhat common belief (especially in the more mainstream press) that you need at least 40-80GB of VRAM to do anything LLM related is flat out wrong.

The other benefit I'd add with buying your own GPUs is availability. They are yours and always yours, in a commercial application with deadlines, etc it's a real risk to depend on being able to get the necessary on-demand GPU compute on various cloud platforms at any point in time. There is nothing worse than logging into a cloud provider console and seeing "no availability" when you really need to get something done. For me personally this is what pushed me to buying vs cloud because I ended up in scenarios where Vast.ai was the only option left and I haven't had the best experiences with Vast.ai in terms of reliability and performance (I'm pretty sure many of the benchmarks are gamed, although I'm not sure how).

Speaking of performance, I've also seen very real issues with virtualized CPUs, what I assume is network attached storage, etc feeding data to high end GPUs fast enough (again, noted elsewhere in this thread). In benchmarking that I've done with various cloud providers unless you go for the much more expensive options on GCP and elsewhere with directly attached NVMe storage a single NVMe drive and decent CPU in a workstation will run circles around many of these cloud providers.

ukuina · on June 15, 2023

Perhaps we need a new term... "Armchair LLMer"?

Rengoku_Kyojuro · on June 15, 2023

Pretty sure it was 2048 A100s not 8000

kkielhofner · on June 15, 2023

A friend of mine is on the LLaMA team at FAIR. They had 8,000 Nvidia A100s at the time. The only reference to your 2048 number was a report where someone at Facebook "estimated" they used 2048 GPUs for five months. My understanding is they used that number as an average over time/power to attempt to calculate carbon usage and the real number varied quite a bit - my friend had an interesting anecdote about detecting uncorrected and otherwise undetectable (even with ECC) GPU VRAM memory errors at that scale.

In any case I think when you're talking A100s in the thousands my point remains. No one is just showing up cold to a cloud provider from a website link and spending at least tens of millions of dollars.

Rengoku_Kyojuro · on June 16, 2023

Oh interesting. Got that number from the LLaMA paper, but thanks for the insider info clarification

nickstinemates · on June 15, 2023

We just don't understand.

kkielhofner · on June 15, 2023

It's only four words and I'm not understanding what you're saying here as all of your other comments (including the one I replied to) seem to be very much in agreement with my position.