Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What is the "DataFusion"?

- not in the FAQ ( https://arrow.apache.org/faq/ )

- not in the Release page.



OK: I have found: https://github.com/apache/arrow-datafusion

"DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.DataFusion supports both an SQL and a DataFrame API for building logical query plans as well as a query optimizer and execution engine capable of parallel execution against partitioned data sources (CSV and Parquet) using threads. DataFusion also supports distributed query execution via the Ballista crate."

"Use Cases: DataFusion is used to create modern, fast and efficient data pipelines, ETL processes, and database systems, which need the performance of Rust and Apache Arrow and want to provide their users the convenience of an SQL interface or a DataFrame API."


You beat me to it, was about to post the github link :) Readme is a good starting place to learn more about the project.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: