More

fzysingularity · 2026-01-07T23:13:13 1767827593

> It's like going to the grocery store and buying tabloids, pretending they're scientific journals.

This is pure gold. I've always found this approach of evals on a moving-target via consensus broken.

fzysingularity · 2026-01-07T21:25:52 1767821152

I'd love to see Claude Code remove more lines than it added TBH.

There's a ton of cruft in code that humans are less inclined to remove because it just works, but imagine having LLM doing the clean up work instead of the generation work.

fzysingularity · 2025-12-03T19:46:30 1764791190

Here's a short cookbook exploring an agentic approach to vision–language tasks: detection, segmentation, OCR, generation, and combining classical CV tools with VLM reasoning.

Happy to run examples if you leave a comment.

[1] IPython notebook: https://github.com/vlm-run/vlmrun-cookbook/blob/main/noteboo...

[2] Colab: https://colab.research.google.com/github/vlm-run/vlmrun-cook...

fzysingularity · 2025-11-30T15:47:22 1764517642

What is photopea built on?

tkfoss · 2025-12-01T22:24:11 1764627851

Author does yearly AMAs on reddit, you should look it up.

fzysingularity · 2025-11-29T06:38:48 1764398328

This is why arenas are generally a bad idea for assessing correctness in visual tasks.

fzysingularity · 2025-11-24T23:25:49 1764026749

FYI one of the models on the battle was pretty slow to load. Are these also being rated on latency or just quality?

kbyatnal · 2025-11-25T01:05:47 1764032747

Ultimately, there’s some intersection of accuracy x cost x speed that’s ideal, which can be different per use case. We’ll surface all of those metrics shortly so that you can pick the best model for the job along those axes.

andrewlu0 · 2025-11-25T00:56:20 1764032180

ideally we want people to rate based on quality - but i imagine some of the results are biased rn based on loading time

hdjrudni · 2025-11-25T05:28:59 1764048539

That's an easy fix if you wait for the slowest one and pop them both in at the same time, no?

fzysingularity · 2025-11-24T20:27:32 1764016052

I definitely see the value and versatility of Claude Skills (over what MCP is today), but I find the sandboxed execution to be painfully inefficient.

Even if we expect the LLMs to fully resolve the task, it'll heavily rely on I/O and print statements sprinkled across the execution trace to get the job done.

mkagenius · 2025-11-24T20:39:24 1764016764

> but I find the sandboxed execution to be painfully inefficient

sandbox is not mandatory here. You can execute the skills on your host machine too (with some fidgeting) but it's a good practice and probably for the better to get in to the habit of executing code in an isolated environment for security purposes.

munk-a · 2025-11-24T22:38:42 1764023922

The better practice is, if it isn't a one-off, being introduced to the tool (perhaps by an LLM) and then just running the tool yourself with structured inputs when it is appropriate. I think the 2015 era novice coding habit of copying a blob of twenty shell scripts off of stack overflow and blindly running them in your terminal (while also not good for obvious reasons) was better than that essentially happening but you not being able to watch and potentially learn what those commands were.

fzysingularity · 2025-11-24T22:44:22 1764024262

I do think that if the agents can successfully resolve these tasks in a code execution environment, it can likely come up with better parametrized solutions with structured I/O - assuming these are workflows we want to run over and over again.

fzysingularity · 2025-11-20T19:46:43 1763668003

Claude does image generation in surprising ways - we did a small evaluation [1] of different frontier models for image generation and understanding, and Claude is by far the most surprising in results.

[1] https://chat.vlm.run/showdown

[2] https://news.ycombinator.com/item?id=45996392

fzysingularity · 2025-11-20T19:08:45 1763665725

We ran a small visual benchmark [1] of GPT, Gemini, Claude, and our new visual agent Orion [2] on a handful of visual tasks: object detection, segmentation, OCR, image/video generation, and multi-step visual reasoning.

The surprising part: models that ace benchmarks often fail on seemingly trivial visual tasks, while others succeed in unexpected places. We show concrete examples, side-by-side outputs, and how each model breaks when chaining multiple visual steps.

We go into more details in our technical whitepaper [3]. Play around with Orion for free here [4].

[1] Showdown: https://chat.vlm.run/showdown

[2] Learn about Orion: https://vlm.run/orion

[3] Technical whitepaper: https://vlm.run/orion/whitepaper

[4] Chat with Orion: https://chat.vlm.run/

Happy to answer questions or dig into specific cases in the comments.

fzysingularity · 2025-11-19T20:35:31 1763584531

SAM3 is cool - you can already do this more interactively on chat.vlm.run [1], and do much more. It's built on our new Orion [2] model; we've been able to integrate with SAM and several other computer-vision models in a truly composable manner. Video segmentation and tracking is also coming soon!

[1] https://chat.vlm.run

[2] https://vlm.run/orion

visioninmyblood · 2025-11-19T21:15:55 1763586955

Wow this is actually pretty cool, I was able to segment out the people and dog in the same chat. https://chat.vlm.run/chat/cba92d77-36cf-4f7e-b5ea-b703e612ea...

luckyLooking · 2025-11-19T23:11:59 1763593919

Even works with long range shots. https://chat.vlm.run/chat/e8bd5a29-a789-40aa-ae31-a510dc6478...

fzysingularity · 2025-11-19T22:08:03 1763590083

Nice, that's pretty neat.