friendlyDE's comments

friendlyDE · on Nov 4, 2024

Location: Ethiopia

Remote: Yes

Willing to relocate: Yes

Technologies:

  - Python, SQL, Bash, AWS, Athena, Glue, Lake Formation, PySpark, Airflow, Kafka, Docker, GIT, PostgreSQL, Tableau, Power BI

  - Big Data Pipeline, ETL, Warehousing, Data Modeling, Shell Scripting

  - Pandas, Scrapy, Beautiful Soup, FastAPI, Django, Flask, SQLAlchemy, boto3, HTMX

Résumé/CV: https://nabilseid.github.io/resume/resume_Nabil_Seid.pdf

Email: nabeelseid@gmail.com

friendlyDE · on Nov 4, 2024

Hi Everyone,

I work in adtech, where we handle massive log-level data. To cut costs and improve performance for ML and optimization, my team and I chose a lakehouse approach using AWS (S3 + OTFs / partitioned Parquet + Athena + Glue).

One challenge we faced with this data stack was managing Athena queries in our ETL jobs. Since Athena handles much of our data-heavy processing, we ended up storing hundreds of lines of query code as strings in Python scripts, which quickly became a nightmare to maintain.

We needed something similar to PySpark SQL that could output SQL string compatible with Athena. So we built athenaSQL. It mimics the PySpark SQL API, providing a familiar interface and outputting SQL queries directly.

It is far from complete at the moment but it has most of the basic query statements. I would love it if you could test it out and share any feedback! I hope someone is in need of such a tool, if it lacks the functionality you are seeking, let’s build it together! And feel free to critique it as much as you like. :)

github: https://github.com/nabilseid/athenaSQL

docs: https://nabilseid.github.io/athenaSQL/