Hacker Newsnew | past | comments | ask | show | jobs | submit | MrDrMcCoy's commentslogin

Which is by nature transient. There are many more and quite dangerous strings attached to doing this online. You never know if all parties involved in the verification are trustworthy.

Stable-diffusion.cpp is where it's at if you don't care for complex installations and node-based workflows.

I personally find nothing about ComfyUI to live up to that name. Node-based workflows are unruly, and you have to know in advance what you need to do for anything to work. Just poking around and figuring stuff out is nearly impossible even for technically literate but AI-inexperienced folks.

You could argue that is what pre-made workflows are for, but that doesn't work super well for users that are off the blessed path in terms of not having Nvidia hardware like everyone assumes. I personally find using stable-diffusion.cpp on the command line to be considerably easier to figure out. Last I saw, it's even shipping a usable demo web ui for those that really want one (my workflow benefits from heavier scripting, so point and click is far too slow and clunky).


He goes over that in the video. It's long, but very much worth watching.

Can't speak for parent, but I've had decent luck with llama.cpp on my triple Ryzen AI Pro 9700 XTs.

The I-prefix stands for Imatrix smoothing in the quantization. It trades a little more accuracy for speed than other quant styles. The _0 and _1 quants are older, simpler quants that are very accurate but kinda slow. The K quants, in my limited understanding, primarily quantize at the specified bit depth, but will bump certain important areas higher, and less used parts lower. It generally performs better while providing similar accuracy to the _1 quants. MXFP4 is specific to Nvidia, so I can't use it on my AMD hardware. It's supposed to be very efficient. The UD part includes more of Unsloth's speed optimizations.

Also, depending on how much regular system RAM you have, you can offload mixture-of-expert models like this, keeping only the most important layers on your GPU. This may let you use larger, more accurate quants. That is functionality that is supported by llama.cpp and other frameworks and is worth looking into how to do.


Still hoping IQuest-Coder gets the same treatment :)

This comment seems fully detached from both the main linked article and the comment it replies to.


Did you read it? It's exactly what I posted, but 100x longer, and with memes.


I did read it, but it sounds like you didn't. It had quite a lot to say about the reason for the switch, the challenges involved, and alternative software to meet real needs. Eye candy was not the focus at all.



If space constrained and wanting compression, why not do that at the filesystem or block layer if it's not supported in the app?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: