More

aliljet · 2026-04-23T02:12:12 1776910332

How many levels of agents are here. Agents riding code by agents in a system driven by agents vibed by one lonely engineer in Redmond?

walrus01 · 2026-04-23T02:18:16 1776910696

Introducing Microsoft Teams Turducken 2026 (Enterprise AI Agent Edition) now with 17 layers

https://www.npr.org/sections/thesalt/2014/11/21/365509503/th...

aliljet · 2026-04-22T12:52:02 1776862322

The real problem is that scientists doing this sort of early work more often than not want to burn hardware under their desks. Renting infrastructure in Google cloud isn't the only way...

tjwebbnorfolk · 2026-04-22T17:48:00 1776880080

You need to have an awfully big desk to stick compute under in order to run a training workload of any interest...

aliljet · 2026-04-21T19:21:21 1776799281

I am hopeful that OpenAI will potentially offer clarity on their loss-leading subscription model. I'd prefer to know the real cost of a token from OpenAI as opposed to praying the venture-funded tokens will always be this cheap.

aliljet · 2026-04-21T18:32:49 1776796369

I've been looking for broken 3090s for a short while. And the whole market is funny. Most of these devices have had their VRAM and GPUs physically harvested (for no clear reason). The ones that are truly broken, however, are still trading in the ~$300 range making me think they're destined for more harvesting. Where are people buying these repairable GPUs? I'd gladly take the gamble for fun, honestly, less profit.

Retr0id · 2026-04-21T18:34:26 1776796466

> for no clear reason

Cards are bought from the consumer market, stripped for parts, and then repackaged for datacentre usage (single slot cooler, more vram)

DogRunner · 2026-04-21T19:12:40 1776798760

Two things: a repaired 3090 24gb can be sold for 1000+ euro in Germany at least. It's still crazy expensive, so if you have the skills and devices, it can be done with a profit when you buy 3 broken cards and can sell 2 repaired one.

I never reballed any IC, so I would not dare to try it on my card.

aliljet · 2026-04-20T05:34:20 1776663260

This is the rugpull that is starting to push me to reconsider my use of Claude subscriptions. The "free ride" part of this being funded as a loss leader is coming to a close. While we break away from Claude, my hope is that I can continue to send simple problems to very smart local llms (qwen 3.6, I see you) and reserve Claude for purely extreme problems appropriate for it's extreme price.

KronisLV · 2026-04-20T06:29:01 1776666541

> This is the rugpull that is starting to push me to reconsider my use of Claude subscriptions.

I'm still with them cause the model is good, but yes, I'm noticing my limits burning up somewhat faster on the 100 USD tier, I bet the 20 USD tier is even more useless.

I wouldn't call it a rugpull, since it seems like there might be good technical reasons for the change, but at the same time we won't know for sure if they won't COMMUNICATE that to us. I feel like what's missing is a technical blog post that tells uz more about the change and the tokenizer, although I fear that this won't be done due to wanting to keep "trade secrets" or whatever (the unfortunate consequence of which is making the community feel like they're being rugpulled).

rimliu · 2026-04-20T07:36:03 1776670563

20 USD tier was useless from the start. You'd get to the limit in 30 minutes. Codex with 20 USD on the other hand...

adezxc · 2026-04-20T12:31:46 1776688306

Give Codex a month or two and it'll just do the same thing, though now with a million more users because Claude Code wasn't good enough.

Grp1 · 2026-04-20T08:03:22 1776672202

OpenAI has been doing the same thing gradually. Codex launched with generous Plus limits, then they introduced the $100 Pro tier, and Plus limits have quietly tightened since. With the same repetitive tasks I was running, consumption is noticeably higher now for the same output.

The pattern feels deliberate — make the $20 tier just uncomfortable enough that power users upgrade, without officially announcing the reduction. If it continues, $20 buys you a demo and $100 buys you actual work.

arw0n · 2026-04-21T11:20:46 1776770446

I'm on the 20 USD tier, and it works quite well for me. Basically I send one very carefully crafted task to the LLM per 4h limit, do a couple of minor questions, and the rest of the time I'm thinking/exploring/coding. I am producing around a tenth of the code of my colleagues, but around the same number of features.

teaearlgraycold · 2026-04-20T07:34:56 1776670496

For now you can run /model claude-opus-4-6

DeathArrow · 2026-04-20T05:43:31 1776663811

Quality of answers from quantized models is noticeable worse than using the full model.

You'll be better using Qwen 3.6 Plus through Alibaba coding plan.

SoMomentary · 2026-04-20T11:49:52 1776685792

> Quality of answers from quantized models is noticeable worse than using the full model.

This is the very reason I've heard I shouldn't use Alibaba!

londons_explore · 2026-04-20T05:43:05 1776663785

I think an LLM that is a decent chunk smarter/better than other LLM's ought to be able to charge a premium perhaps 10x or 100x it's competitors.

See for example the price difference between taking a taxi and taking the bus, or between hiring a real lawyer Vs your friend at the bar who will give his uninformed opinion for a beer.

aliljet · 2026-04-18T17:43:51 1776534231

What open models are truly competing with both Claude Code and Opus 4.7 (xhigh) at this stage?

xvector · 2026-04-19T03:39:59 1776569999

Spent a lot of time with "open models." None of them come close. They are benchmaxxed. But you won't hear many of the open model fans on HN admit this.

The open model mentality is also just so bizarre to me. You're going to use an inferior model to save, what, a couple hundred bucks a month? Is your time really worth that little?

No one working on a serious project at a serious company is downgrading their agent's intelligence for a marginal cost saving. Downgrading your model is like downgrading the toilet paper on your yacht.

tredre3 · 2026-04-19T04:26:58 1776572818

> The open model mentality is also just so bizarre to me. You're going to use an inferior model to save, what, a couple hundred bucks a month? Is your time really worth that little?

I agree that people who claim that open models are as good as claude/openai/z are lying, delusional, or not doing very much. I've tried them all, included GLM 5.1.

GLM is not bad but the hardware needed will never recoup the ROI vs just using a commercial provider through its API.

That being said, you're being reductive here. For many use cases local models offer advantages that can't obtained through a commercial API : Privacy, ownership of the entire stack, predictability. They can't be rugpulled, they can't snitch on you. They will not give you 503.

Those advantages are very valuable for things like a local assistant, as an agent, for data extraction, for translations, for games (role playing and whatnot), etc.

That being said I know that many people are like you, they don't give a second thought about privacy. They'd plug Anthropic to their brain if they could. So I understand the sentiment. I just think that you should in turn try to understand why someone would use an open model.

WarmWash · 2026-04-19T04:08:00 1776571680

Glm 5.1 getting 5% on ARC-AGI 2 private is all anyone needs to know.

parinporecha · 2026-04-18T18:10:57 1776535857

I've had a good experience with GLM-5.1. Sure it doesn't match xhigh but comes close to 4.6 at 1/3rd the cost

slopinthebag · 2026-04-19T08:58:43 1776589123

1/3? Try 2/13 :P

5.1 is like $4 / 1m output, Opus 4.6 is $25. GPT 5.4 pro is $270 with large contexts :O

esafak · 2026-04-18T18:09:16 1776535756

GLM 5.1 competes with Sonnet. I'm not confident about Opus, though they claim it matches that too.

ojosilva · 2026-04-18T18:55:46 1776538546

I have it as failover to Opus 4.6 in a Claude proxy internally. People don't notice a thing when it triggers, maybe a failed tool call here and there (harness remains CC not OC) or a context window that has gone over 200k tokens or an image attachment that GLM does not handle, otherwise hunky-dory all the way. I would also use it as permanent replacement for haiku at this proxy to lower Claude costs but have not tried it yet. Opus 4.7 has shaken our setup badly and we might look into moving to Codex 100% (GLM could remain useful there too).

Someone1234 · 2026-04-18T18:01:36 1776535296

That's a lame attitude. There are local models that are last year's SOTA, but that's not good enough because this year's SOTA is even better yet still...

I've said it before and I'll say it again, local models are "there" in terms of true productive usage for complex coding tasks. Like, for real, there.

The issue right now is that buying the compute to run the top end local models is absurdly unaffordable. Both in general but also because you're outbidding LLM companies for limited hardware resources.

You have a $10K budget, you can legit run last year's SOTA agentic models locally and do hard things well. But most people don't or won't, nor does it make cost effective sense Vs. currently subsidized API costs.

gbro3n · 2026-04-18T18:10:55 1776535855

I completely see your point, but when my / developer time is worth what it is compared to the cost of a frontier model subscription, I'm wary of choosing anything but the best model I can. I would love to be able to say I have X technique for compensating for the model shortfall, but my experience so far has been that bigger, later models out perform older, smaller ones. I genuinely hope this changes through. I understand the investment that it has taken to get us to this point, but intelligence doesn't seem like it's something that should be gated.

Someone1234 · 2026-04-18T18:19:52 1776536392

Right; but every major generation has had diminishing returns on the last. Two years ago the difference was HUGE between major releases, and now we're discussing Opus 4.6 Vs. 4.7 and people cannot seem to agree if it is an improvement or regression (and even their data in the card shows regressions).

So my point is: If you have the attitude that unless it is the bleeding edge, it may have well not exist, then local models are never going to be good enough. But truth is they're now well exceeding what they need to be to be huge productivity tools, and would have been bleeding edge fairly recently.

gbro3n · 2026-04-18T18:48:03 1776538083

I feel like I'm going to have to try the next model. For a few cycles yet. My opinion is that Opus 4.7 is performing worse for my current work flow, but 4.6 was a significant step up, and I'd be getting worse results and shipping slower if I'd stuck with 4.5. The providers are always going to swear that the latest is the greatest. Demis Hassabis recently said in an interview that he thinks the better funded projects will continue to find significant gains through advanced techniques, but that open source models figure out what was changed after about 6 months or so. We'll see I guess. Don't get me wrong, I'd love to settle down with one model and I'd love it to be something I could self host for free.

dakiol · 2026-04-18T19:26:40 1776540400

> I completely see your point, but when my / developer time is worth what it is compared to the cost of a frontier model subscription, I'm wary of choosing anything but the best model I can.

Don't you understand that by choosing the best model we can, we are, collectively, step by step devaluating what our time is worth? Do you really think we all can keep our fancy paychecks while keep using AI?

gbro3n · 2026-04-18T20:04:52 1776542692

Do you think if you or me stopped using AI that everyone else will too? We're still what we always were - problem solvers who have gained the ability to learn and understand systems better that the general population, communicate clearly (to humans and now AIs). Unfortunately our knowledge of language APIs and syntax has diminished in value, but we have so many more skills that will be just as valuable as ever. As the amount of software grows, so will the need for people who know how to manage the complexity that comes with it.

lelanthran · 2026-04-18T22:53:45 1776552825

> Unfortunately our knowledge of language APIs and syntax has diminished in value, but we have so many more skills that will be just as valuable as ever.

There were always jobs that required those "many more skills" but didn't require any programming skills.

We call those people Business Analysts and you could have been doing it for decades now. You didn't, because those jobs paid half what a decent/average programmer made.

Now you are willingly jumping into that position without realising that the lag between your value (i.e. half your salary, or less) would eventually disappear.

gbro3n · 2026-04-19T07:15:12 1776582912

I guess we will need to wait and see if AI can remove ALL of the complexity that requires a software engineer over a business analyst. I can't currently believe that it will. BA's I've worked with vary in technical capability from 'having coded before and understanding DB schema basics and network architecture' to 'I know how the business works but nothing about computers'. If we got to the point in the future where every computer system ran on the same frameworks in the same way, and AI understood it perfectly, then maybe. But while AI is a probabilistic technology manipulating deterministic systems, we will always need people to understand whats going on, and whether they write a lot of code or not, they will be engineers, not analysts. Whether it's more or less of those people, we will see.

lelanthran · 2026-04-19T07:57:06 1776585426

> If we got to the point in the future where every computer system ran on the same frameworks in the same way, and AI understood it perfectly, then maybe.

They don't need to all run on the same frameworks, they just need to run on documented frameworks.

What possible value can you bring to a BA?

The system topology (say, if the backend was microservices vs Lambda vs something-else)? The LLM can explain to the BA what their options are, and the impact of those options.

The framework being used (Vue, or React, or something else)? The AI can directly twiddle that for the BA.

Solving a problem? If the observability is setup, the LLM can pinpoint almost all the problems too,and with a separate UAT or failover-type replica, can repro, edit, build, deploy and test faster than you can.

Like I already said, if[1] you're now able to build or enhance a system without actually needing programming skills, why are you excited about that? You could always do that. It's just that it pays half what programming skills gets you.

You (and many others who boast about not writing code since $DATE) appear to be willingly moving to a role that already pays less, and will pay even less once the candidates for that role double (because now all you programmers are shifting towards it).

It's supply and demand, that's all.

--------------

[1] That's a very big "If", I think. However, the programmers who are so glad to not program appear to believe that it's a very small "If", because they're the ones explaining just how far the capabilities have come in just a year, and expect the trend to continue. Of course, if the SOTA models never get better than what we have now, then, sure - your argument holds - you'll still provide value.

aliljet · 2026-04-18T18:46:27 1776537987

First, making sure to offer an upvote here. I happen to be VERY enthusiastic about local models, but I've found them to be incredibly hard to host, incredibly hard to harness, and, despite everything, remarkably powerful if you are willing to suffer really poor token/second performance...

wellthisisgreat · 2026-04-18T19:01:01 1776538861

> that are last year's SOTA

Early last year or late last year?

opus 4.5 was quite a leap

HWR_14 · 2026-04-18T19:34:31 1776540871

$10k is a lot of tokens.

sscaryterry · 2026-04-18T19:52:58 1776541978

At the rate its consuming now, I'd probably blow $10k in a month easy.

aliljet · 2026-04-17T17:09:14 1776445754

This is the reality I'm seeing too. Does this mean that the subscriptions (5x, 10x, 20x) are essentially reduced in token-count by 20-30%?

aray07 · 2026-04-17T17:17:38 1776446258

yeah thats the part that is unclear to me as well - if our usage capacity is now going to run out faster.

AndyNemmity · 2026-04-17T18:48:03 1776451683

The same thing I've been doing all the time, now has used up 1/3rd of my week in one day on max20.

So yes, for the same tasks, usage runs out faster (currently)

cbg0 · 2026-04-17T19:09:16 1776452956

Boris said on Twitter that they've increased rate limits for everyone.

aliljet · 2026-04-16T19:52:38 1776369158

I'm really curious about what competes with Claude Code to drive a local LLM like Qwen 3.6?

chabes · 2026-04-16T22:29:01 1776378541

OpenCode or Pi are popular agent harnesses. Lots of IDEs integrate LLMs now. I believe there’s also a Qwen Code that exists, but I have yet to try it.

smashed · 2026-04-16T20:00:21 1776369621

OpenCode?

aliljet · 2026-04-16T15:01:04 1776351664

Have they effectively communicated what a 20x or 10x Claude subscription actually means? And with Claude 4.7 increasing usage by 1.35x does that mean a 20x plan is now really a 13x plan (no token increase on the subscription) or a 27x plan (more tokens given to compensate for more computer cost) relative to Claude Opus 4.6?

computomatic · 2026-04-16T15:08:48 1776352128

They have communicated it as 5x is 5 x Pro, and 20x is 20 x Pro (I haven’t looked lately so not sure if that’s changed).

They have also repeatedly communicated that the base unit (Pro allotment) is subject to change and does change often.

As far as I can tell, that implies there is no guarantee that those subscriptions get some specific number of tokens per unit of time. It’s not a claim they make.

msikora · 2026-04-16T23:42:13 1776382933

I think as far as the maybe more important weekly allotment Max 5 is 10x Pro and Max 20 is 20x Pro. For the 5 hour window it is as the names would suggest though.

DonsDiscountGas · 2026-04-16T16:09:25 1776355765

Definitely 13x, at least for now

ModernMech · 2026-04-16T17:00:56 1776358856

Feels like buying toilet paper.

aliljet · 2026-04-16T14:54:20 1776351260

Have they effectively communicated what a 20x or 10x Claude subscription actually means? And with Claude 4.7 increasing usage by 1.35x does that mean a 20x plan is now really a 13x plan (no token increase on the subscription) or a 27x plan (more tokens given to compensate for more computer cost) relative to Claude Opus 4.6?

oidar · 2026-04-16T14:58:10 1776351490

Anthropic isn't going to give us that information. It's not actually static, it depends on subscription demand and idle compute available.

willis936 · 2026-04-17T00:13:22 1776384802

Given they have all of the information and all of the control, do you trust them to be fair?

kingleopold · 2026-04-16T16:22:36 1776356556

so it's all "it depends" as a business offering, lmao. all marketing

minimaxir · 2026-04-16T15:23:49 1776353029

The more efficient tokenizer reduces usage by representing text more efficiently with fewer tokens. But the lack of transparancy does indeed mean Anthropic could still scale down limits to account for that.

redml · 2026-04-16T15:45:26 1776354326

a few months ago it was for weekly:

pro = 5m tokens, 5x = 41m tokens, 20x = 83m tokens

making 5x the best value for the money (8.33x over pro for max 5x). this information may be outdated though, and doesn't apply to the new on peak 5h multipliers. anything that increases usage just burns through that flat token quota faster.

bearjaws · 2026-04-16T17:29:03 1776360543

I am 90% sure it's looking at month long usage trends now and punishing people who utilize 80%+ week over week. It's the only way to explain how some people burn through their limit in an hour and others who still use it a lot get through their hourly limits fine.

redml · 2026-04-16T17:44:47 1776361487

It's hard to say. Admittedly I'm a heavy user as I intentionally cap out my 5x plan every week - I've personally found that I get more usage being on older versions of CC and being very vigilant on context management. But nobody can say for sure, we know they have A/B test capabilities from the CC leaks so it's just a matter of turning on a flag for a heavy user.

aliljet · 2026-04-16T16:57:07 1776358627

wait. that's insanity. where did you get those numbers from? the 5x plan is obviously the right place to be...

redml · 2026-04-16T17:38:34 1776361114

someone did the math and posted it somewhere, I forgot where, searching for it again just provides the numbers i remember seeing. at the time i remembered what it was like on pro vs 5x and it felt correct. again, it may not be representative of today.

zrkrlc · 2026-04-16T23:13:20 1776381200

You’re probably thinking of this article: https://she-llac.com/claude-limits

redml · 2026-04-17T05:48:03 1776404883

thats it! thanks for digging it up