Few observations related to data engineering in the context of a data warehouse:
1. Protocols and IR (Intermediate Representation) have layed and continue to enable interoperability and composability of data tools (see Apache Arrow, Substrait, Catalog). (great introduction here https://voltrondata.com/codex).
2. Current OSS data tooling is really good (except on user interface).
3. Agentic workflow are working incredibly well for data-engineering tasks.
4. LLM is pushing for declarative tools and docs close to code.
That's why I am working on a (early) project called Orca [1]. Orca is a template and a set of patterns for building a production-ready and agentic-enabled data warehouse using entirely free and open-source tools. Go check-out the README for more info. I would be interested to get feedback to it!
I started building an agentic-ready data warehouse (GitHub.com/mathisdrn/orca) and was thinking that my skills could be optimized by benchmarking them. Turns out there is a better way of optimizing and building them using model languages themselves as evaluator and skill builder. See DsPy and GEPA.
I am wondering whether Anthropic and OpenAI skill-creator skill is themselves optimize to optimize skills efficiency on various tasks.
Author doesn't mention it but he should try to use BetterDisplay.
MacOS interface scaling works well for screens around 200 PPI (2K 13inch, 4K 24inch, 5K 27inch). 4K 32inch is 138 PPI, which likely means he is not using default interface scaling which causes some distortion and out of grid pixel rendering.
BetterDisplay fixes this by using an integer multiple of intended GUI scaling resolution before projecting it (3X -> 1.5X).
I actually am using it, but I didn't want to go down the rabbithole of an all-encompassing article on displays, PPI, scaling, etc. Using it to scale the display really helps, but I find that for the size of things I like 3008x1692 (on a native 3840x2160 panel) and this looks fine on an LCD. And is better than native res on the OLED, but still not great. It still bugged my eyes.
I just went with native res for demoing things because it's a worst-case, but the fringing problem, because it affects all strong-contrast edges not just text. It was also really noticeable at thin/narrow lines such as when doing CAD or between cells in spreadsheets.
I believe the author missed another approach of the semantic layer. That is the one used by Power BI Semantic model or, and perhaps, the most interesting one Malloy.
In these tools, the semantic layer is a thin layer that only define the following:
- metric definition (mostly as aggregation function)
- dimensions of analysis (product category, country, etc.)
This blog makes a much better argument than I would at presenting why Malloy is a really interesting and welcome innovation in Data Analytics space : https://carlineng.com/?postid=malloy-intro#blog
The visualisations could be improved by binning number of maintainer 1 / 2-10 / 11-n or by plotting cumulative distribution (ie. x% of projects have less than y contributors)
Even that would be mis-representative... I know of many packages with contributions from hundreds of people, but the bulk of the work was still 1 or 2 primary maintainers based on commits.
I have had the same issue with Bolt recently in Lyon airport. Had to wait 45 minutes with a driver who wouldn't answer message or call and was waiting the other way of the airport. Bolt support was awful to reach during those 45 minutes. Driver should be held accountable of those actions by the platform too.
Positron IDE is a VS Code fork intended for R language. It feels more modern than R Studio and I was under the impression that it would replace it at some point.
That raises two questions:
Does GitHub Copilot or your extension works in Positron IDE ?
That's why I am working on a (early) project called Orca [1]. Orca is a template and a set of patterns for building a production-ready and agentic-enabled data warehouse using entirely free and open-source tools. Go check-out the README for more info. I would be interested to get feedback to it!
[1] Orca : https://github.com/mathisdrn/orca