RTX3090 TensorFlow, NAMD and HPCG Performance on Linux

SloopJon · on Oct 2, 2020

Enroot sounds interesting:

It basically returns containers to their chroot origins, promising "no performance overhead." I'm looking forward to more posts on that.

bcatanzaro · on Oct 2, 2020

My team at NVIDIA uses enroot extensively. It's been really nice. We build containers using Docker but then run them with enroot.

fulafel · on Oct 2, 2020

What performance overhead does this avoid compared to other container runtimes?

Another aspect, the "unprivileged "part sounds like an advantage over Docker, on par with podman and lxd etc.

gnufx · on Oct 2, 2020

What's the advantage of enroot over charliecloud, which is unprivileged in the sense of being installable in your home directory (given user namespaces)? https://hpc.github.io/charliecloud/

exxo_ · on Oct 3, 2020

It is the same idea, we actually considered it at first. There are some differences in the implementation though and we built enroot with the idea of being more extensible. We also have a plugin for SLURM (https://github.com/NVIDIA/pyxis)

exxo_ · on Oct 3, 2020

There are several things that can impact performance on "traditional" container runtimes. For example, cgroups, LSMs, seccomp (especially with spectre mitigations), network NS/bridges, etc. There are also more subtle things like being able to do CMA, or deal with shared memory. Most runtimes let you opt out but this becomes difficult to manage and secure with multiple users.

kernelsanderz · on Oct 3, 2020

I was originally appalled at the software limiting. But according to Tim Dettmers who has a solid record of predicting and comparing NVIDIA cards for deep learning performance, it's not really a big deal.

You can read his analysis here: https://timdettmers.com/2020/09/07/which-gpu-for-deep-learni...

and his tweet about this here: https://twitter.com/Tim_Dettmers/status/1311354118514982912

Essentially from my understanding it's memory bandwidth which is the real critical path on performance in most cases. The previous generation of Turing cards had more compute than was necessary so they were an underutilized resource.

Also, this Puget benchmark is using an older version of the CUDA drivers. I believe performance is much better in CUDA 11.1.

This new benchmark which is running on the latest CUDA seems to confirm Tim's numbers: https://www.evolution.ai/post/benchmarking-deep-learning-wor...

bitL · on Oct 2, 2020

When I first read the review, I couldn't understand why the author was mentioning that FP16 will surely be improved by new drivers despite not understanding that FP16 TFlops are exactly the same as FP32, tensor cores were nerfed and FP32 accumulate set to 0.5x speed, instead of Titan RTX's 1x. I'd say the results are as good as it gets, if you want a better performance, wait for Ampere Titan or Quadro.

vimy · on Oct 2, 2020

I was under the impression the 3090 is the Titan of this generation.

spott · on Oct 2, 2020

I was as well, until I saw the Linus Tech Tips review of it: the drivers are (were?) missing support for some Titan optimizations -- the Titan RTX significantly beat the 3090 for a few benchmarks.

If the card doesn't have these optimizations, I would expect that an actual 30 series Titan is coming at some point... But the marketing has been really confusing, so who the hell knows.

nightski · on Oct 2, 2020

Not only that they gimped the tensor cores. While it has way more shading units it only has 60% of the tensor cores that the Titan RTX has. I'm not sure how much of a difference this makes in practice, but it leads me to believe this is not the titan level card.

StrangeDoctor · on Oct 2, 2020

It’s typically problematic to compare cores across generations. They are pretty different in 30xx vs 20xx. Half as many but roughly twice as fast in most tasks.

pbalau · on Oct 2, 2020

20xx had 2060 - low end, 2070 - mid, 2080 - high. The 3000 have 3070, 3080 and 3090. It looks to me that 3090 is the equivalent of 2080 (TI or super or what have you), not a step above (name wise).

henriquez · on Oct 2, 2020

Rumor is that there will be a 48gb Ampere Titan. Nvidia has never been one to shy away from making money!

Klinky · on Oct 2, 2020

It's the "Titan class" card of this generation, which is the shady way Nvidia explained it wasn't a Titan. "Titan class" != Titan

dbkinghorn · on Oct 2, 2020

well ... it was wishful or hopeful thinking :-) I do hope to see better performance but I'm not as optimistic now as I was at first (got some pretty enlightening comments on the post)

ilzmastr · on Oct 2, 2020

I'm looking forward to multi-gpu tests!

Would be good to see if it is worth upgrading x4 and x8 setups.

Single gpu upgrade being worth it is a no-brainer. Launch price of the 30xx cards is lower that then purchase-able price of the two comparison cards!

If only you could buy them though. The only microcenter in PA got 15 of each 30xx on respective launch days.

If anyone knows how many of these cards are being produced please do share.

komuher · on Oct 2, 2020

Its without XLA and without cudnn and cuda 11.1 for ga102 lets wait for proper drivers to see full results :P

optimalsolver · on Oct 2, 2020

Is there any reason NVIDIA aren't selling A100s as individual cards?

MichaelBurge · on Oct 2, 2020

PCI Express variants are $8300 individually.

rkwasny · on Oct 2, 2020

They just have the wrong ptxas - needs Cuda 11.1 and properly set $PATH.

ngcc_hk · on Oct 2, 2020

Guess that is maximum many of us can afford. Hence features that are missing from A100 is a bit moot. But the update we are wait. Still based on what we saw 3090 really does not worth the money. Still 24gb is 24gb.

YetAnotherNick · on Oct 2, 2020

If you run it 24/7, anyways the cost of electricity will be much more in a year or so. If you run it sparingly, within some time cloud based 30 series will be launched for sure.

sudosysgen · on Oct 2, 2020

What is the price of electricity where you live? For me to pay as much as the card is worth in electricity, even running it continuously, it would take 11 years.

Granted, electricity is exceedingly cheap here, but still, 11 years is a long time.

aidenn0 · on Oct 2, 2020

About $0.20/kWh here. Works out to $600 a year I think

sudosysgen · on Oct 2, 2020

Hmm, for it can be as low as 0.05$/kWh. Maybe I should start a compute business!

VHRanger · on Oct 2, 2020

0.35 kwh * 24 hours * 365 at $0.1/kwh is $300

ApolloRising · on Oct 2, 2020

Google Colab seems like a nice deal if you can't afford it.

epmaybe · on Oct 2, 2020

What is the vram they offer now? I forget if they give the full 24gb of k80 or not

YetAnotherNick · on Oct 2, 2020

They give half of that. But most of the time you will be allotted 16gb T4 which is much nicer.

panpanna · on Oct 2, 2020

What does google get for providing this service for free?

optimalsolver · on Oct 2, 2020

Full access to your data and algorithms.

YetAnotherNick · on Oct 2, 2020

Any source?

dgacmu · on Oct 2, 2020

A way of creating a pipeline of future customers; goodwill; increased clue and interest in machine learning, which may help them sell more platform services in the future.

speedgoose · on Oct 2, 2020

You are the product.

unstatusthequo · on Oct 2, 2020

Neat, would love to try it but can’t buy the card anywhere. Another shitty Nvidia launch.