More

billylo · 2026-02-11T03:14:31 1770779671

We learned computer languages so we can ask computers to do work for us. It was out of necessity because there are no other ways.

If we can instruct computers with natural language 50% of the time, that's 50% less translation work for our human brains. I have no problem with not needing to write instructions in computer languages (no regex, no sed/awk, no python even) for day-to-day stuff.

Critical thinking, reasoning are another story. We can't let those skills atropied.

keepamovin · 2026-02-11T03:22:52 1770780172

It's a strange new world to get used to.

billylo · 2026-02-11T12:27:35 1770812855

Time it takes to go from 100m users to 5 billions: Internet (25 years), Smartphone (13), AI? (tracking to achieve that in ~6 years!)

https://evergreen-labs.org/assets/Acceleration.jpg

billylo · 2026-02-08T15:19:24 1770563964

We should ask how the traders manage this. It's essentially 24/7 markets in the world. For them, the FOMO effects are even stronger... actual money earning opportunity.

billylo · 2026-01-29T15:02:59 1769698979

That's why benchmarks are useful. We all suffer from the shortcomings of human perception.

gpm · 2026-01-29T15:09:43 1769699383

Benchmarks shortcomings are no worse... they inevitably measure something that is only close to the thing you actually care about, not the thing you actually care about. It's entirely plausible that this decreased benchmark score is because Anthropic's initial prompting of the model was overtuned to the benchmark and as they're gaining more experience with real world use they are changing the prompt to do better at that and consequentially worse at the benchmark.

billylo · 2026-01-29T15:19:46 1769699986

I wonder how best we can measure the usefulness of models going forward.

Thumbs up or down? (could be useful for trends) Usage growth from the same user over time? (as an approximation) Tone of user responses? (Don't do this... this is the wrong path... etc.)

turnsout · 2026-01-29T15:37:18 1769701038

Benchmarks measure what they measure. But your subjective experience also matters.

billylo · 2026-01-11T22:15:45 1768169745

xcode has been getting better bit-by-bit. No major regression.

billylo · 2026-01-06T16:03:47 1767715427

Windows and macOS does come with a small model for generating text completion. You can write a wrapper for your own TUI to access them platform agnostically.

For consistent LLM behaviour, you can use ollama api with your model of choice to generate. https://docs.ollama.com/api/generate

Chrome has a built-in Gemini Nano too. But there isn't an official way to use it outside chrome yet.

vintagedave · 2026-01-10T14:04:29 1768053869

Do you know what it’s called, at least on Windows? I’m struggling to find API docs.

When I asked AI it said no such inbuilt model exists (possibly a knowledge date cutoff issue.)

billylo · 2026-01-11T01:46:01 1768095961

https://learn.microsoft.com/en-us/windows/ai/apis/phi-silica

vintagedave · 2026-01-12T10:22:31 1768213351

Thankyou!

bredren · 2026-01-10T16:08:58 1768061338

Yes. I am not aware of a model shipping with Windows nor announced plans to do so. Microsoft’s been focused on cloud based LLM services.

usefulposter · 2026-01-10T18:23:04 1768069384

This thread is full of hallucinations ;)

tony_cannistra · 2026-01-10T16:02:24 1768060944

These are the on-device model APIs for apple: https://developer.apple.com/documentation/foundationmodels

nvader · 2026-01-06T18:48:30 1767725310

Is there a Linux-y standard brewing?

billylo · 2026-01-07T20:27:20 1767817640

Each distro is doing their own thing. If you are targeting Linux mainly, I would suggest to code it on top of ollama or LiteLLM

1bpp · 2026-01-10T23:33:19 1768087999

Windows doesn't?

billylo · 2026-01-01T17:04:51 1767287091

Please let us know when and which LLM changes its "minds". This is a cool experiment. I wish there are more time-bound datasets that we can experiment with to get a better sense on how LLMs are influenced.

billylo · 2025-12-21T14:59:02 1766329142

I did a fact check on the article and it seems to check out. I am happy to help billy@evergreen-labs.org.

I'd also suggest to reach out to the author of the NYT article and look for ideas. They took the time to study the subject and will likely have some insights on what could work (tech or non-tech approaches.)

billylo · 2025-12-20T23:59:02 1766275142

Same here. Agents has allowed me to take on more experiments because the cost for testing ideas is now much much lower.

billylo · 2025-12-11T12:54:44 1765457684

Compared to moving to assembly coding to high level languages, which change do you think is more dramatic? I am curious.

billylo · 2025-12-09T12:43:12 1765284192

If you are curious about doing something similar with TPU, Google has an article. https://developers.googleblog.com/train-gpt2-model-with-jax-...