This hardware is fascinating - I've seen other examples of these farms Deployed in Ukraine but does anyone have more info on how these "servers" are orchestrated?
I've found just opening a window does the most to make a difference - in most European / US homes (even with propane / gas service) the "exhaust" just blows into the same room or a cabinet!
That said, I personally use the Breathe Airmonitor Plus [1] - I kept having issues with calibration with the temtop unit. Mostly decided on this one since it uses an NDIR sensor similar to my Aranet which I carry with me all the time.
It's actually just wild - my last apt in Manhattan was built in 2018 and had a gas range. The "exhaust fan" just vented back into the kitchen and the only "exhaust" per se was a vent in the bathroom that "sucked" all the way to the roof with negative pressure.
If I ran my stove for more than 15 min my carbon monoxide / fire alarm would go off.
I'm a huge fan of OpenRouter and their interface for solid LLM's but I recently jumped into fine tuning / modifying my own vision models for FPV drone detection (just for fun) and my daily workstation and it's 2080 just wasn't good enough.
Even in 2025 it's cool how solid a setup dual 3090's still are. nvlink is an absolute must but it's incredibly powerful. I'm able to run the latest Mistral thinking models and relatively powerful yolo based VLM's like the ones RoboFlow is based on.
Curious if anyone else is still using 3090's or has feedback for scaling up to 4-6 3090s.
a used 3090 is around $900 on ebay.
a used rtx 6000 ADA is around $5k
4 3090s are slower at inference and worse at training than 1 rtx 6000.
4x3090 would consume 1400W at load.
Rtx 6000 would consume 300W at load.
If you god forbid live in California and your power averages 45 cents per kwh, 4x3090 would be $1500+ more per year to operate than a single RTX 6000[0]
[0] Back of the napkin/ChatGPT calculation of running the GPU at load for 8 hours per day.
Note: I own a pc with a 3090, but if i had to build an AI training workstation, i would seriously consider cost to operate and resale value(per component).
To make matters worse, the RTX3090 was released during the crypto craze and so a decent amount of the second hand market could contain overused GPUs that won’t last long, even if 3xxx to 4xxx performance difference is not that high, I would avoid the 3xxx series totally for resell value.
I bought 2 ex mining 3090s ~3 years ago. They’re in an always on pc that I remote into. Haven’t had a problem. If there was mass failures of gpus due to mining I would expect to have heard more about it
I have rig of 7 3090s that I bought from crypto bros, they are lasting quite alright and have been chugging along fine for the last 2 years. GPUs are electronic devices not mechanical devices, they rarely blow up.
you get a motherboard designed for the purpose (many pcie slots) and a case (usually open frame) that holds that many cards. riser cables are used so every card doesnt plug directly into the motherboard
I've noticed on ebay there are a lot of 3090s for sale that seem to have rusted or corroded heatsinks. I actually can't recall seeing this with used GPUs before but maybe I just haven't paying attention. Does this have to do with running them flat out in a basement or something?
I have an A6000 and the main advantage over a 3090 cluster is the build simplicity and relative silence of the machine (it is also used as my main dev workstation).
Since you're exploring options just for fun, out of curiosity, would you rent it out whenever you're not using it yourself, so it's not just sitting idle? (Could be noisy and loud). You'd be able to use your computer for other work at the same time and stop whenever you wanted to use it yourself.
(you should also be compensated for the noise and inconvenience from it, not only electricity.) It sounds like you might rent it out if the rental price were higher.
... and this is why napkin calculation is terrible. Even running a GPU at load doesn't mean you are going to use the full wattage. 4 3090 running inference on large model barely uses 350watts combined.
Inference is often like 200-250w without card clocked down. Then the other cards are like 20w-50w. 4 cards, 1 card is active at once. To get the full 350watt, you need to run parallel inference on the card with multiple users. So if I was using it as a server card and have 10 active users/processes then I might max out the active card. For example, I have a rig with 10 MI50 cards, I believe they are 250w each. Yet I rarely see pass 200w on the active card, they idle at about 20w, so that's 180w + 200w = around 380-400w on full load.
Think of the max watt like a car's max horsepower, a car might make 350HP, it doesn't mean it stays making 350HP all day long, there's a curve to it. At the low end it might be making 170HP and you will need to floor the gas pedal to get to that 350hp. Same with these GPUs. Most people will calculate the gas mileage by finding how much gas a car consumers at it's peak and say, oh, 6mpg when it's making 350hp so with your 20gallon thank, you have a range of 120miles. Which obviously isn't true.
I've built a rig with 14 of them. NVLink is not 'an absolute must', it can be useful depending on the model and the application software you use and whether you're training or inferring.
The most important figure is the power consumed per token generated. You can optimize for that and get to a reasonably efficient system, or you can maximize token generation speed and end up with two times the power consumption for very little gain. You also will likely need to have a way to get rid of excess heat and all those fans get loud. I stuck the system in my garage, that made the noise much more manageable.
I am curious about the setup of 14 GPUs - what kind of platform (motherboard) do you use to support so many PCIe lanes? And do you even have a chassis? Is it rack-mounted? Thanks!
I used a large supermicro server chassis, a dual Xeon motherboard with 7 8 lane PCI Express slots, all the ram it would take (bought second hand), splitters, four massive powersupplies. I extended the server chassis with aluminum angle riveted onto the base. It could be rack mounted but I'd hate to be the person lifting it in. The 3090s were a mix, 10 of the same type (small, and with blower style fans on them) and 4 much larger ones that were kind of hard to accommodate (much wider and longer). I've linked to the splitter board manufacturer in another comment in this thread. That's the 'hard to get' component but once you have those and good cables to go with them the remaining setup problems are mostly power and heat management.
The 3090 are a sweet spot for training. It’s the first generation with seriously fast VRAM. And it’s the last generation before Nvidia blocked NVlink. If you need to copy parameters between GPUs during training, the 3090 can be up to 70% faster than 4090 or 5090. Because the latter two are limited by PCI express bandwidth.
To be fair though, the 4090 and 5090 are much easier capable of saturating PCI express than the 3090 is, even at 4 lanes per card the 3090 rarely manages to saturate the links, it still handsomely pays off to split down to 4 lanes and add more cards.
I bought a 2nd 3090 2 years ago for like 800eur, still a good price even today I think.
It's in my main workstation, and my idea was to always have Ollama running locally. The problem is that once I have a (large-ish) model running, all my VRAM is almost full and GPU struggles to do things like playing back a YouTube video.
Lately I haven't used local AI much, also because I stopped using any coding AIs (as they wasted more time than they saved), I stopped doing local image generations (the AI image generation hype is going down), and for quick questions I just ask ChatGPT, mostly because I also often use web search and other tools, which are quicker on their platform.
Unfortuatenly, my CPU (5900x) doesn't have an iGPU.
The last 5 years iGPU got a bit out of trend. Now maybe they actually make a lot of sense, as there is a clear use-case which involves having dedicated GPU always in-use which is not gaming (and gaming is different, cause you don't often multi-task while gaming).
I do expect to see a surge in iGPU popularity, or maybe a software improvement to allow having a model always available without constantly hogging the VRAM.
PS: I thought Ollama had a way to use RAM instead of VRAM (?) to keep the model active when not in use, but in my experience that didn't solve the problem.
if it's just for detection would audio not be cheaper to process?
I'm imagining a cluster of directional microphones, and then i don't know if it's better to perform some sort of band pass filtering first since it's so computationally cheap or whether it's better to just feed everything into the model directly. No idea.
I guess my first thought was just sounds from a drone likely is detectable reliably at a greater distance than visual, they're so small and a 180 degree by 180 degree hemisphere of pixels is a lot to process.
What's interesting is solar in cold environments on sunny days can actually OVER produce.
I've had a few friends run into this issue with systems almost frying inverters because they didn't allow enough headroom for open circuit voltage coming from the panels.