Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes. The Ballista crate (part of the arrow-datafusion repo) provides distributed query execution and the scheduler has a gRPC service. Flight is used internally as well but not directly exposed to users. There is also work in progress to add Python bindings for Ballista (they already exist for DataFusion).


Thank you. I went through its GitHub repo for docs. It seems I need to dig a bit deeper perhaps. How to get started with my Parquet files isn’t immediately obvious.

I assume Python bindings would talk through gRPC. I could use gRPC directly perhaps?


The best "Getting Started" documentation right now is that on docs.rs - https://docs.rs/ballista/0.5.0/ballista/

This demonstrates using the Rust client (BallistaContext + DataFrame). There are already Python bindings for DataFrame but not BallistaContext yet.

Documentation for Ballista is severely lacking right now and this will be an area of focus for the next release.


Thanks. I’m experimenting with Rust currently. This might fit the bill. I am curious though why does the client need to use async Rust. I hadn’t gotten that far in my learnings. I would have guessed that synchronous way should work as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: