More

mathisd · 2026-03-20T20:35:58 1774038958

Few observations related to data engineering in the context of a data warehouse: 1. Protocols and IR (Intermediate Representation) have layed and continue to enable interoperability and composability of data tools (see Apache Arrow, Substrait, Catalog). (great introduction here https://voltrondata.com/codex). 2. Current OSS data tooling is really good (except on user interface). 3. Agentic workflow are working incredibly well for data-engineering tasks. 4. LLM is pushing for declarative tools and docs close to code.

That's why I am working on a (early) project called Orca [1]. Orca is a template and a set of patterns for building a production-ready and agentic-enabled data warehouse using entirely free and open-source tools. Go check-out the README for more info. I would be interested to get feedback to it!

[1] Orca : https://github.com/mathisdrn/orca

mathisd · 2026-02-17T17:21:57 1771348917

I started building an agentic-ready data warehouse (GitHub.com/mathisdrn/orca) and was thinking that my skills could be optimized by benchmarking them. Turns out there is a better way of optimizing and building them using model languages themselves as evaluator and skill builder. See DsPy and GEPA. I am wondering whether Anthropic and OpenAI skill-creator skill is themselves optimize to optimize skills efficiency on various tasks.

mathisd · 2026-01-10T11:59:44 1768046384

Author doesn't mention it but he should try to use BetterDisplay. MacOS interface scaling works well for screens around 200 PPI (2K 13inch, 4K 24inch, 5K 27inch). 4K 32inch is 138 PPI, which likely means he is not using default interface scaling which causes some distortion and out of grid pixel rendering. BetterDisplay fixes this by using an integer multiple of intended GUI scaling resolution before projecting it (3X -> 1.5X).

c0nsumer · 2026-01-10T13:09:59 1768050599

(Author here.)

I actually am using it, but I didn't want to go down the rabbithole of an all-encompassing article on displays, PPI, scaling, etc. Using it to scale the display really helps, but I find that for the size of things I like 3008x1692 (on a native 3840x2160 panel) and this looks fine on an LCD. And is better than native res on the OLED, but still not great. It still bugged my eyes.

I just went with native res for demoing things because it's a worst-case, but the fringing problem, because it affects all strong-contrast edges not just text. It was also really noticeable at thin/narrow lines such as when doing CAD or between cells in spreadsheets.

joeig · 2026-01-10T13:26:34 1768051594

Thank you. BetterDisplay is exactly what I need for my 4K 32 inch screen.

mathisd · 2025-10-04T10:58:43 1759575523

I believe the author missed another approach of the semantic layer. That is the one used by Power BI Semantic model or, and perhaps, the most interesting one Malloy. In these tools, the semantic layer is a thin layer that only define the following: - metric definition (mostly as aggregation function) - dimensions of analysis (product category, country, etc.)

This blog makes a much better argument than I would at presenting why Malloy is a really interesting and welcome innovation in Data Analytics space : https://carlineng.com/?postid=malloy-intro#blog

gompertz · 2025-10-04T14:07:05 1759586825

Thanks for this. Can you suggest any books that go into these topics with examples?

mathisd · 2025-08-28T21:38:03 1756417083

The visualisations could be improved by binning number of maintainer 1 / 2-10 / 11-n or by plotting cumulative distribution (ie. x% of projects have less than y contributors)

tracker1 · 2025-08-28T22:52:34 1756421554

Even that would be mis-representative... I know of many packages with contributions from hundreds of people, but the bulk of the work was still 1 or 2 primary maintainers based on commits.

mathisd · 2025-08-17T11:57:47 1755431867

I really like DuckDB but I can't see this being a pattern used for prototyping nor for production.

mathisd · 2025-08-15T09:57:40 1755251860

Really cool

mathisd · 2025-08-05T15:46:58 1754408818

Time lost waiting due to unresponsive driver: 35 minutes Location: Lyon Airport App: Bolt Resolution method: Harassing Bolt AI support

mathisd · 2025-07-22T19:47:34 1753213654

I have had the same issue with Bolt recently in Lyon airport. Had to wait 45 minutes with a driver who wouldn't answer message or call and was waiting the other way of the airport. Bolt support was awful to reach during those 45 minutes. Driver should be held accountable of those actions by the platform too.

mathisd · 2025-07-22T00:01:35 1753142495

Positron IDE is a VS Code fork intended for R language. It feels more modern than R Studio and I was under the impression that it would replace it at some point. That raises two questions: Does GitHub Copilot or your extension works in Positron IDE ?

jorgeoguerra · 2025-07-22T00:08:12 1753142892

Right now our assistant is only available in RStudio. We do plan to develop an assistant for Positron-like IDEs in the future though.

ecshafer · 2025-07-22T00:55:19 1753145719

Positron is made by Posit, which is formerly the R Studio Company. So I would say its basically the new R Studio.