More

syntaxing · 2026-04-22T16:45:26 1776876326

Been using Qwen 3.6 35B and Gemma 4 26B on my M4 MBP, and while it’s no Opus, it does 95% of what I need which is already crazy since everything runs fully local.

FuckButtons · 2026-04-22T17:47:58 1776880078

It’s good enough that I’ve been having codex automate itself out of a job by delegating more and more to it.

Very excited for the 122b version as the throughput is significantly better for that vs the dense 27b on my m4.

Someone1234 · 2026-04-22T19:07:47 1776884867

You've got me curious. Two questions if I may:

- What kind of tasks/work?

- How is either Qwen/Gemma wired up (e.g. which harness/how are they accessed)?

Or to phase another way; what does your workflow/software stack look like?

syntaxing · 2026-04-22T19:23:24 1776885804

1. Qwen is mostly coding related through Opencode. I have been thinking about using pi agent and see if that works better for general use case. The usefulness of *claw has been limited for me. Gemma is through the chat interface with lmstudio. I use it for pretty much everything general purpose. Help me correct my grammar, read documents (lmstudio has a built in RAG tool), and vision capabilities (mentioned below, journal pictures to markdown).

2. Lmstudio on my MacBook mainly. You can turn on an OpenAI API compatible endpoint in the settings. Lmstudio also has a headless server called lms. Personally, I find it way better than Ollama since lmstudio uses llama cpp as the backend. With an OpenAI API compatible endpoint, you can use any tool/agent that supports openAI. Lmstudio/lms is Linux compatible too so you can run it on a strix halo desktop and the like.

ycombinatornews · 2026-04-23T02:53:06 1776912786

Curious how do you run opencode and qwen locally? Few times I tried it responds back with some nonsense. Chat, say, through ollama works well.

syntaxing · 2026-04-23T12:11:12 1776946272

Which quants are you using? I had similar issue until I used Unsloth’s. I would recommend at least UD_6. Also, make sure your context length is above 65K.

https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF

Someone1234 · 2026-04-22T20:45:33 1776890733

Thanks I appreciate the info. I may try to spin up something like this and give it a whirl.

anon373839 · 2026-04-22T22:47:36 1776898056

I would recommend trying oMLX, which is much more performant and efficient than LM Studio. It has block-level KV context caching that makes long chats and agentic/tool calling scenarios MUCH faster.

throwaw12 · 2026-04-22T16:50:57 1776876657

can you expand more on what you mean by 95%?

There are 2 aspects I am interested in:

1. accuracy - is it 95% accuracy of Opus in terms of output quality (4.5 or 4.6)?

2. capability-wise - 95% accuracy when calling your tools and perform agentic work compared to Opus - e.g. trip planning?

syntaxing · 2026-04-22T17:11:19 1776877879

1. What do you mean by accuracy? Like the facts and information? If so, I use a Wikipedia/kiwx MCP server. Or do you mean tool call accuracy?

2. 3.6 is noticeably better than 3.5 for agentic uses (I have yet to use the dense model). The downside is that there’s so little personality, you’ll find more entertainment talking to a wall. Anything for creative use like writing or talking, I use Gemma 4. I also use Gemma 4 as a “chat” bot only, no agents. One amazing thing about the Gemma models is the vision capabilities. I was able to pipe in some handwritten notes and it converted into markdown flawlessly. But my handwriting is much better than the typical engineer’s chicken scratch.

throwaw12 · 2026-04-22T17:15:03 1776878103

by accuracy I meant how close is the output to your expectations, for example if you ask 8B model to write C compiler in C, it outputs theory of how to write compiler and writes pseudocode in Python. Which is off by 2 measures: (1) I haven't asked for theory (2) I haven't asked to write it in Python.

Or if you want to put it differently, if your prompt is super clear about the actions you want it to do, is it following it exactly as you said or going off the rails occasionally

syntaxing · 2026-04-22T17:24:07 1776878647

Ironically, even though I write C/++ for a living, I don’t use it for personal projects so I can’t say how well it works for low level coding. Python works great but there’s a limit on context size (I just don’t have enough RAM, and I do not like quantizing my kv cache). Realistically, I can fit 128K max but I aim for 65K before compacting. With Unsloth’s Opencode templating, I haven’t had any major issues but I haven’t done anything intense with it as of late. But overall, I have not had to stop it from an endless loop which happened often on 3.5.

physicles · 2026-04-22T22:35:37 1776897337

I have a Supernote and was looking at different models for handwriting recognition, and I agree that gemma4-26B is the best I’ve tried so far (better than a qwen3-vl-8B and GLM-OCR). Besides turning off thinking, does your setup have any special sauce?

syntaxing · 2026-04-23T00:53:27 1776905607

Q8 or Q6_UD with no KV cache quantization. I swear it matters even more with small activated parameters MOE model despite the minimal KL divergence drop

richstokes · 2026-04-22T20:14:20 1776888860

Do you use it with ollama? Or something else?

syntaxing · 2026-04-22T22:04:00 1776895440

Llama cpp is vastly superior. There was this huge bug that prevented me from using a model in ollama and it took them four months for a “vendor sync” (what they call it) which was just updating ggml which is the underpinning library used by llama cpp (same org makes both). lmstudio/lms is essentially Ollama but with llama cpp as backend. I recommend trying lmstudio since it’s the lowest friction to start

syntaxing · 2026-04-22T16:42:01 1776876121

Yes and no. Are you using open router or local? Are the models are good as Opus? No. But 99% of the time, local models are terrible because of user errors. Especially true for MoE, even though the perplexity only drops minimal for Q4 and q4_0 for the KV cache, the models get noticeably worse.

acidtechno303 · 2026-04-22T16:59:54 1776877194

Sounds like you're accusing a professional of holding their tool incorrectly. Not impossible, but not likely either.

syntaxing · 2026-04-22T17:16:02 1776878162

Inferencing is straight up hard. I’m not accusing them of anything. There’s a crap ton of variables that can go into running a local model. No one runs them at native FP8/FP16 because we cannot afford to. Sometimes llama cpp implementation has a bug (happens all the time). Sometimes the template is wrong. Sometimes the user forgot to expand the context length to above the 4096 default. Sometimes they use quantization that nerfs the model. You get the point. The biggest downside of local LLMs is that it’s hard to get right. It’s such a big problem, Kimi just rolled out a new tool so vendors can be qualified. Even on openrouter, one vendor can be half the “performance” of the other.

syntaxing · 2026-04-22T00:17:14 1776817034

What does heavy RL even mean…similar to how the CEO of cursor said how much better the perplexity got when it’s a terrible metric for model fine tune performance? Let’s be real here, it’s Kimi 2.5 fine tuned for Cursor. There’s nothing wrong with that but they tried to hide it and it’s some work they put in but nothing close to training a model of their own.

syntaxing · 2026-04-22T00:00:10 1776816010

60B for Composer 2…that is built from Kimi K2… what ever happened to “Grok being the best”?

apsurd · 2026-04-22T00:12:48 1776816768

Am I the only one that thinks Composer is really good, when you factor in the speed and the cost?

syntaxing · 2026-04-22T00:21:14 1776817274

I don’t doubt it is. End of the day, it’s a fine tuned Kimi. They tried to hide it and making their work sound more impressive than it is. It’s easy to have stuff be cheap when you don’t have to train your own model from scratch.

vachina · 2026-04-22T01:04:27 1776819867

Composer is clearly dumber than the rest but then I only ask it dumb questions and it answers them really quickly.

Marciplan · 2026-04-22T00:25:00 1776817500

yes, you are

syntaxing · 2026-04-21T23:28:30 1776814110

With GitHub and Anthropic reducing subscription features, Chinese providers are looking more and more tempting.

anakaine · 2026-04-22T00:37:50 1776818270

Until you work for a company or government agency that is subject to any sort of technology audit. The moment offshore processes running in China comes up you'll have a never ending hole of questions to answer.

syntaxing · 2026-04-21T14:22:13 1776781333

As the engineering saying goes, nothing more permanent than a temporary solution

syntaxing · 2026-04-19T19:15:18 1776626118

3.4B in 4.5 months…is that all going to Anthropic? Makes it seem so with the wording and how they’re pivoting to Codex too

dmix · 2026-04-19T19:19:45 1776626385

It's probably all AI spending, including them doing AI stuff for their products.

gigatexal · 2026-04-19T19:24:39 1776626679

oh man uber is acquiring the company I work for [1] and we currently really like Claude ... but if Codex is better so be it. I just really, really, really like Claude Code as a front end. Guess I'll have to make it talk Codex instead.

[1] it's public knowledge https://investor.uber.com/news-events/news/press-release-det...

syntaxing · 2026-04-19T19:38:48 1776627528

Curious how it works in other countries, do employees get a portion of the payout?

gigatexal · 2026-04-20T22:09:05 1776722945

We have virtual shares that vest on sale. I won’t be rich. Enough to take a nice trip to Mallorca after taxes.

happyopossum · 2026-04-20T01:35:19 1776648919

That’s the entire R&D budget - the article is completely lacking in actual details, such as how much was spent on Ai

3eb7988a1663 · 2026-04-19T19:46:51 1776628011

If it is anything like my company, sign enormous deals to AI startups that have existed for 8 months, and do little more than provider wrappers around someone else's model. Then hire three different firms that do the same thing because each division has to prove how much more AI they are than the others. Have a handful of internal engineers who have no idea what they are doing, but get approval to build and run an internal B200 server farm. Ensure any big jobs are done through some kind of white-glove offering from Amazon/Azure that removes complexity, but charges astronomical rates.

feedtheclank · 2026-04-20T01:28:06 1776648486

"My delivery service CEO told me the AI keep eating his tokens so I asked how many tokens he has and he said he just goes to the token shop and gets a new batch of tokens afterwards so I said it sounds like he’s just feeding tokens to the AI and then his laid off workers started crying."

syntaxing · 2026-04-19T00:37:06 1776559026

> And the men that had spent longer looking after babies showed the largest drops in testosterone. Those that shared a bed with their infants also had lower levels.

Dad here. Maybe…it’s the lack of sleep? Involved fathers tend to have less sleep.

bitshiftfaced · 2026-04-19T01:13:36 1776561216

Parents also tend to gain weight, and higher BMI is associated with a decline in T.

https://pmc.ncbi.nlm.nih.gov/articles/PMC3809034/

benhurmarcel · 2026-04-19T12:30:54 1776601854

Yes being a parent would tend to correlate to a drop in physical activities and sports

kyleee · 2026-04-19T05:40:06 1776577206

Do BBC have low T?

IAmBroom · 2026-04-21T20:40:36 1776804036

I'm not going to google that to find out what you meant.

mbac32768 · 2026-04-19T02:59:56 1776567596

Yes, chronically disturbed sleep is the obvious confounder and is well known to drop T and explains the observed small changes a lot better.

_DeadFred_ · 2026-04-19T20:23:21 1776630201

If human babies actually evolved to be the terrors they are in order to lower fathers testosterone levels/chill them out that would be wild.

Henchman21 · 2026-04-19T17:39:04 1776620344

Or, just gonna put this out there... you have successfully fathered a child. A drop-off in T seems normal -- you've done your job and now you care for that child and lose the drive to father a significant number more. You accomplished your biological purpose and slowly slide on into death over the next number of decades. So it is. We are not immortals and the phases of life should not be avoided out of selfish vanity. Easy to say online, eh? :)

verteu · 2026-04-19T01:16:21 1776561381

Several of the studies described changes in hormones before the child was born.

davorak · 2026-04-19T02:22:35 1776565355

Extra time commitment, and therefor missing some sleep, can start before the baby is born.

ludicrousdispla · 2026-04-19T08:00:46 1776585646

Reminds me of when I would stay up late ironing my wife's maternity BDUs.

herewulf · 2026-04-19T22:33:47 1776638027

With or without starch? Please tell me you were taking care of boots as well!

ludicrousdispla · 2026-04-21T18:12:28 1776795148

I don't recall using starch on the BDUs, I might have polished the boots once or twice, but that was just over twenty years ago, so who knows.

IncreasePosts · 2026-04-19T02:19:26 1776565166

If you cosleep with your 8 month pregnant wife she might not be sleeping well and by proximity you may not be sleeping well.

WarOnPrivacy · 2026-04-19T19:12:45 1776625965

> Several of the studies described changes in hormones before the child was born.

For me, sleep dropped off right after I got the "I'm pregnant" phone call. I'd only known this girl for [time it takes a baby to be detected] days.

mbac32768 · 2026-04-19T03:05:22 1776567922

Given this is "BBC Future" let me guess, barely above significance and n=16?

goku12 · 2026-04-19T07:46:54 1776584814

I'm unfamiliar with the subject. What's the problem with BBC Future?

joemazerino · 2026-04-19T16:59:28 1776617968

On the right line. Lower sleep, higher coping (bad diet, alcohol etc) would lead to T destruction. Not surprised BBC didn't connect the dots here.

e40 · 2026-04-19T12:40:00 1776602400

Evolutionarily this makes sense. Lower testosterone means less carousing, and better fatherhood.

syntaxing · 2026-04-18T22:40:37 1776552037

I went to college as a MechE so unsure if compsci was different. But overall, all the “fun” projects were labs. We have three semesters of hell and all 3 semesters had 2-3 labs, and we write 20 pages or so for EACH lab a week (usually a team of 2-3).

syntaxing · 2026-04-16T17:07:25 1776359245

Is it worth running speculative decoding on small active models like this? Or does MTP make speculative decoding unnecessary?