What is the cheapest GPU I can get to run stable diffusion well?

opsecisgay [they/them] · edit-2 2 years ago

What is the cheapest GPU I can get to run stable diffusion well?

drhead [he/him] · edit-2 2 years ago

First off, here's a spreadsheet for this. If you want to skip the nerd shit just look for something with a high CUDAbench score and at least 8gb of VRAM and you'll be fine. https://docs.google.com/spreadsheets/d/1Zlv4UFiciSgmJZncCujuXKHwc4BcxbjbSBg71-SdeNk/edit#gid=0

In general, the most important thing would be to have an NVIDIA GPU with at least 8GB of VRAM for inference (generating images) -- so a 3060 Ti would work just fine. ~~If the prices on the spreadsheet are~~ It would be great if the prices on the sheet were accurate, a 2080 Ti is ~~probably cheaper and~~ a bit more expensive, but better, though. Your CPU and regular RAM barely matter at all, but VRAM determines what you can do and CUDA score is how fast you can do it. That should allow you to do most things in a decent amount of time.

If you want to get into more advanced things like training your own models, I think the bare minimum for that is 16GB if you are using 8bit Adam. 24GB is recommended, but your cheapest general option for that is a used 3090 (~~$800ish I think~~ good luck finding a cheap one now on Ebay). I don't know what the price of what you're looking at is now, but ~~if you can afford to splurge on a used 3090, you most certainly won't have to buy another GPU for quite a while~~ I can no longer recommend the 3090 at this time.

If you want to train models more cheaply, I have also heard of people using a Tesla M40 24GB (which is a datacenter GPU, i.e. you can't really use it for gaming), which costs a bit under $150 used on Ebay, and there's a bunch of them. The good thing about that is you can use it for model training, and it is probably the cheapest way to do so. The bad news is: 1) it's much slower than even the 3060 Ti (1/4th the CUDAbench score), 2) you can't really use workstation cards for gaming very well, 3) it's designed to be used in a server rack with blower fans and therefore only has a heatsink, you'll need to figure out a cooling solution for it. Now that's not to say this is worthless -- you can still make good use of it with Dreambooth and make custom models which can reproduce a specific style or object with decent accuracy off of a few dozen images. Those don't take long to train and for an M40 that would only take a few hours to train. But for the fastest general use you probably just want a good 8GB GPU.

DefiantAnything562 [he/him] · edit-2 2 years ago

deleted by creator

drhead [he/him] · 2 years ago

NVIDIA is essentially an absolute requirement because of CUDA. NVIDIA might as well have a monopoly on machine learning computation. I think there's some stuff you can do to get some things to work which I haven't looked into, but... you really just can't.

DefiantAnything562 [he/him] · edit-2 2 years ago

deleted by creator

EmmaGoldman [she/her, comrade/them] · 2 years ago

Where are you finding a 2080 Ti for less than a 3060? I don't even see used 2080s for less than a brand new 3060, let alone a Ti

drhead [he/him] · 2 years ago

I am assuming this is used ones on Ebay. But I know this spreadsheet is out of date on multiple counts.

Ebay price I see a few going for ~400 (buy it now price) for the 2080 Ti, but it does appear the 3060 Ti is cheaper with a bunch under 300. I'd still go for one of the 2080 Ti's if possible.

And... it also looks like all of the 3090s got snagged. Looks like I have to update the post.

AppelTrad [she/her] · 2 years ago

To what extent is an older PCI-E version (2 or 3, presumably, for an "ancient DDR3 i7") going to act as a bottleneck?

drhead [he/him] · 2 years ago

As far as I know, each version of PCI-E from 2 to 4 doubles the bandwidth of its predecessor. I have no idea how this would translate into performance directly, and I would be a bit surprised if it was a linear relationship. The worst that could happen is it might take longer to generate results.

Simferopol [none/use name] · 2 years ago

the calculation is done inside the gpu right? i don't think there is much transfer on the pci-e bus with stable diffusion.

Owl [he/him] · 2 years ago

If you're using one of the low-VRAM workarounds, a slow bus is going to hurt. But you're already hurting in that situation.

AlyxMS [he/him] · edit-2 2 years ago

RTX2060 12GB

Performance do not matter(only affects speed, but what's the difference between 10 second per image or 5 second per image?), you want as much VRAM as possible, which allows for more complex models and higher output resolution. 2060 12GB has the highest vram for the lowest price as of now.

If you can find a 3060 12GB for a slightly higher price, it would be better value, as it offers higher performance.

AMD has pretty cheap models with 16GB vram but unfortunately, due to AI stuff running on CUDA, getting them to work can be trouble.

If you really knows your stuff, perhaps you could get your hands on some second hand Teslas(Like P40? Did not look too deep into them) which can has low performance but a lot of VRAM. But you'll need to put together some sort of custom cooling solution as they are server cards which is designed to have air forced through them. Also getting them to work can be tricky, like AMD cards.

If you don't care about the resolution, just wanted to mess around and don't want to use online solutions(plenty of them on google colab), you can get it to work with as little as 6GB vram.

WoofWoof91 [comrade/them] · edit-2 2 years ago

i'm running it on an aftermarket 1060 6gb
it's the worst card that still gives acceptable generation times, it is pretty slow though
rest of the rig is an old i5-2500K and 16gb of ddr3
takes ~2 minutes to generate a batch of 3 512x512 images at ~40-50 steps, depending on which sampling method i use
then another ~30 seconds to run the best one through the upscaler

RION [she/her] · 2 years ago

As AlyxMS said 2060 is probably going to be your best bet for price/performance, but you could even go as low as 6-8gb VRAM and get decent results if you're using one of the super-efficient versions.

Just don't get a 1600 series card. For whatever reason they don't work with half precision optimization so you're severely limited in what you can do

opsecisgay [they/them] · 2 years ago

Thank you so much for your responses everyone! I read all your comments and I'm going for the cheapest 12gb card I can find and I will see how well it can do over an ancient PCI 2.0 interface lol

Apparently I'm going to lose half the performance of the card in gaming because of this bottleneck lmao but hopefully it won't affect stable diffusion too much!