Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Congrats on the launch!

I contributed to the NVIDIA Spark RAPIDS project for ~4 years and for the past year have been contributing to DataFusion Comet, so I have some experience in Spark acceleration and I have some questions!

1. Given the momentum behind the existing OSS Spark accelerators (Spark RAPIDS, Gluten + Velox, DataFusion Comet), have you considered collaborating with and/or extending these projects? All of them are multi-year efforts with dedicated teams. Both Spark RAPIDS and Gluten + Velox are leveraging GPUs already.

2. You mentioned that "We're fully compatible with Spark SQL (and Spark)." and that is very impressive if true. None of the existing accelerators claim this. Spark compatibility is notoriously difficult with Spark accelerators built with non-JVM languages and alternate hardware architectures. You have to deal with different floating-point implementations and regex engines, for example.

Also, Spark has some pretty quirky behavior. Do you match Spark when casting the string "T2" to a timestamp, for example? Spark compatibility has been pretty much the bulk of the work in my experience so far.

Providing acceleration at the same time as guaranteeing the same behavior as Spark is difficult and the existing accelerators provide many configuration options to allow users to choose between performance and compatibility. I'm curious to hear your take on this topic and where your focus is on performance vs compatibility.



1. Yes! Would love to contribute back to these projects, since I am already using RAPIDS under the hood. My general goal is to bring GPU acceleration to more workloads. Though, as solo founder, I am finding it difficult to have any time for this at the moment, haha.

2. Hmm, maybe I should mention that we're not "accelerating all operations" -- merely compatible. Spark-RAPIDS has the goal of being byte-for-byte compatible unless incompatible ops are specifically allowed. But... you might be right about that kind of quirk. Would not be surprising, and reminds me of checking behavior between compilers.

I'd say the default should be a focus on compatibility, and work through any extra perf stuff with our customers. Maybe a good quick way to contribute back to open source is to first upstream some tests?

Thanks for your great questions :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: