Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
RTX3090 TensorFlow, NAMD and HPCG Performance on Linux (pugetsystems.com)
70 points by polymorph1sm on Oct 2, 2020 | hide | past | favorite | 36 comments


Enroot sounds interesting:

https://github.com/NVIDIA/enroot

It basically returns containers to their chroot origins, promising "no performance overhead." I'm looking forward to more posts on that.


My team at NVIDIA uses enroot extensively. It's been really nice. We build containers using Docker but then run them with enroot.


What performance overhead does this avoid compared to other container runtimes?

Another aspect, the "unprivileged "part sounds like an advantage over Docker, on par with podman and lxd etc.


What's the advantage of enroot over charliecloud, which is unprivileged in the sense of being installable in your home directory (given user namespaces)? https://hpc.github.io/charliecloud/


It is the same idea, we actually considered it at first. There are some differences in the implementation though and we built enroot with the idea of being more extensible. We also have a plugin for SLURM (https://github.com/NVIDIA/pyxis)


There are several things that can impact performance on "traditional" container runtimes. For example, cgroups, LSMs, seccomp (especially with spectre mitigations), network NS/bridges, etc. There are also more subtle things like being able to do CMA, or deal with shared memory. Most runtimes let you opt out but this becomes difficult to manage and secure with multiple users.


I was originally appalled at the software limiting. But according to Tim Dettmers who has a solid record of predicting and comparing NVIDIA cards for deep learning performance, it's not really a big deal.

You can read his analysis here: https://timdettmers.com/2020/09/07/which-gpu-for-deep-learni...

and his tweet about this here: https://twitter.com/Tim_Dettmers/status/1311354118514982912

Essentially from my understanding it's memory bandwidth which is the real critical path on performance in most cases. The previous generation of Turing cards had more compute than was necessary so they were an underutilized resource.

Also, this Puget benchmark is using an older version of the CUDA drivers. I believe performance is much better in CUDA 11.1.

This new benchmark which is running on the latest CUDA seems to confirm Tim's numbers: https://www.evolution.ai/post/benchmarking-deep-learning-wor...


When I first read the review, I couldn't understand why the author was mentioning that FP16 will surely be improved by new drivers despite not understanding that FP16 TFlops are exactly the same as FP32, tensor cores were nerfed and FP32 accumulate set to 0.5x speed, instead of Titan RTX's 1x. I'd say the results are as good as it gets, if you want a better performance, wait for Ampere Titan or Quadro.


I was under the impression the 3090 is the Titan of this generation.


I was as well, until I saw the Linus Tech Tips review of it: the drivers are (were?) missing support for some Titan optimizations -- the Titan RTX significantly beat the 3090 for a few benchmarks.

If the card doesn't have these optimizations, I would expect that an actual 30 series Titan is coming at some point... But the marketing has been really confusing, so who the hell knows.


Not only that they gimped the tensor cores. While it has way more shading units it only has 60% of the tensor cores that the Titan RTX has. I'm not sure how much of a difference this makes in practice, but it leads me to believe this is not the titan level card.


It’s typically problematic to compare cores across generations. They are pretty different in 30xx vs 20xx. Half as many but roughly twice as fast in most tasks.


20xx had 2060 - low end, 2070 - mid, 2080 - high. The 3000 have 3070, 3080 and 3090. It looks to me that 3090 is the equivalent of 2080 (TI or super or what have you), not a step above (name wise).


Rumor is that there will be a 48gb Ampere Titan. Nvidia has never been one to shy away from making money!


It's the "Titan class" card of this generation, which is the shady way Nvidia explained it wasn't a Titan. "Titan class" != Titan


well ... it was wishful or hopeful thinking :-) I do hope to see better performance but I'm not as optimistic now as I was at first (got some pretty enlightening comments on the post)


I'm looking forward to multi-gpu tests!

Would be good to see if it is worth upgrading x4 and x8 setups.

Single gpu upgrade being worth it is a no-brainer. Launch price of the 30xx cards is lower that then purchase-able price of the two comparison cards!

If only you could buy them though. The only microcenter in PA got 15 of each 30xx on respective launch days.

If anyone knows how many of these cards are being produced please do share.


Its without XLA and without cudnn and cuda 11.1 for ga102 lets wait for proper drivers to see full results :P


Is there any reason NVIDIA aren't selling A100s as individual cards?


PCI Express variants are $8300 individually.


They just have the wrong ptxas - needs Cuda 11.1 and properly set $PATH.


Guess that is maximum many of us can afford. Hence features that are missing from A100 is a bit moot. But the update we are wait. Still based on what we saw 3090 really does not worth the money. Still 24gb is 24gb.


If you run it 24/7, anyways the cost of electricity will be much more in a year or so. If you run it sparingly, within some time cloud based 30 series will be launched for sure.


What is the price of electricity where you live? For me to pay as much as the card is worth in electricity, even running it continuously, it would take 11 years.

Granted, electricity is exceedingly cheap here, but still, 11 years is a long time.


About $0.20/kWh here. Works out to $600 a year I think


Hmm, for it can be as low as 0.05$/kWh. Maybe I should start a compute business!


0.35 kwh * 24 hours * 365 at $0.1/kwh is $300


Google Colab seems like a nice deal if you can't afford it.


What is the vram they offer now? I forget if they give the full 24gb of k80 or not


They give half of that. But most of the time you will be allotted 16gb T4 which is much nicer.


What does google get for providing this service for free?


Full access to your data and algorithms.


Any source?


A way of creating a pipeline of future customers; goodwill; increased clue and interest in machine learning, which may help them sell more platform services in the future.


You are the product.


Neat, would love to try it but can’t buy the card anywhere. Another shitty Nvidia launch.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: