More

dherls · 2026-05-19T01:57:37 1779155857

Seems needlessly angry about what is ultimately a decent if imperfect source of entropy, and a good illustrative example for the general public

dherls · 2026-04-26T16:26:48 1777220808

Giving LLM agents direct, autonomous access to a real production databases with write access seems insane to me.

NO ONE, agent or human, should have direct write access to production databases outside of emergency break glass scenarios. This is why we have stored routines and API layers to pre-define what writes are allowed. The facts that agents CAN autonomously write to a database does not imply that they should.

For the point about query optimization, again your agents should not be issuing random queries against a production database. We have had the concept of separate analytics databases with different architectures to support exporatory queries for decades.

tee-es-gee · 2026-04-26T17:55:18 1777226118

I agree and hope this is the case for anything serious enough. I also don't see this changing any time soon.

There are ways to give safe access to the data, at least read-only, that don't involve production risk and don't sacrifice privacy. For example, database branches with anonymization. Instead of accessing the prod/staging db, the agent creates a branch and has read/write access to that.

(disclaimer: I work at Xata, where we offer copy-on-write branches for Postgres, and the agent use-cases are the most popular right now)

exec7 · 2026-04-26T16:37:43 1777221463

I totally i agree! even read access specialty when databases has some sensitive/personal information about users.

sgarland · 2026-04-26T18:50:48 1777229448

I’m a DBRE. I spend a good portion of my day with a shell into one or more prod databases. The schema definitions in code are scattered between ORM model definitions, Alembic migrations, and Liquibase migrations, so the only reliable way I have of understanding a schema as it exists is to view it. Plus, I am very comfortable with SQL, and the various system catalogs of both MySQL and Postgres, so it’s a ton easier to work with.

Truly sensitive customer information is encrypted, and on an isolated DB cluster that no one has regular access to. I also operate with a read-only grant, because manual writes to a prod DB is generally a terrible idea.

Pxtl · 2026-04-26T22:15:46 1777241746

So what do you do for "okay, we need to run this script that we've decided is a necessary operation". Special account? Everything go through the build server? I've been looking for tooling for "I need to do a production operation but I want it to have proper interlocks and reviews".

tyzoid · 2026-04-26T22:58:21 1777244301

You can use something like flyway on top of your existing git/cicd stack. Write the query as a migration, have it reviewed using your git code review process, and merge to run the migration.

sgarland · 2026-04-27T04:34:06 1777264446

If it’s an incident, it’s usually manually run after review, with an audience. If it isn’t, it’s run as a script that goes through normal PR review.

Pxtl · 2026-04-27T17:37:11 1777311431

That's what I'm getting at - so in the end, somebody really does need to have access to a regular sysadmin account on the server, even if it's not their default login. I was hoping that there was an option that didn't involve that sort of workflow, or abusing migration tools (since this isn't exactly a migration).

m463 · 2026-04-26T20:16:22 1777234582

> autonomous access to a real production databases

remember that filesystems are just sophisticated databases

  rm -rf /<unexpectedly-missing-value>

root_axis · 2026-04-26T22:24:57 1777242297

And the same logic applies.

i7l · 2026-04-26T17:01:04 1777222864

How does that even work in compliance-relevant scenarios where the audit trail shows some LLM messed with the data? Who, if anyone, is on the hook?

mr-wendel · 2026-04-26T17:20:55 1777224055

My guess is that if the database is subject to auditing then LLM access (obviously writes in particular, but even reads come with exfiltration risks) will be a hard "no" and instant red flag. When it's a person, there is a sense of accountability and opportunity for remediation.

I suppose that LLMs will be treated as a code artifact and liability will shift upstream towards who deployed/approved the access in the first place. Even though code is essentially deterministic, making that association fairly simple, it's going to boil down to this same paradigm.

Perhaps governance rules will evolve to even explicitly forbid it, but my gut feeling is that for what the future determines to be "practical" reasons (right or wrong) LLMs will warrant an entirely new set of rules to allow them to be in the chain at all.

+ EDIT: both my wife and I have experience in this area and the current answer is companies like KPMG don't have an answer yet. Existing rules do help (e.g. there better be good documented reasons why it was used and that access was appropriately scoped, etc), but there is enough ambiguity around these tools so they say "stay tuned, and take caution".

bmurphy1976 · 2026-04-26T17:42:12 1777225332

The dev who ran it. The manager who allowed it. The director/VP/CTO who enabled the culture. They all have some responsibility for it.

waterTanuki · 2026-04-26T23:36:41 1777246601

dilution of responsibility isn't just dangerous, it's illegal for some industries. Aircraft manufacturers need to log and track every single bolt, panel, and fastener on a plane back to the engineer who installed it. The moment you dilute risk between people you eliminate auditability.

joquarky · 2026-04-26T19:55:55 1777233355

Whoever provided the authorization credentials to the agent is on the hook.

cowlby · 2026-04-26T16:34:44 1777221284

LLM agents are unlocking demand and supply for applications that wouldn't have been possible before due to time constraints though. There's a growing demand for single user or smaller scoped apps where giving LLM agents direct access means velocity. The failure/rollback model is much easier with these as long as we have good backup hygiene.

antonvs · 2026-04-26T16:38:43 1777221523

This makes no sense whatsoever.

It's not news that if you just give all developers at a company write access to the production databases, owner permissions on all resources, etc. that velocity can be increased. But at what cost?

The reason we don't do that in most cases is that "move fast and break things" only makes sense for trivial, non-critical applications that don't have any real importance, like Facebook.

cowlby · 2026-04-26T20:17:55 1777234675

There's thousands of small and medium business though. They have maybe one true CRM, and a dozen spreadsheets/files floating around that would benefit becoming proper apps. People delete spreadsheets all the time!

Sure don't give an LLM agent write access to the modeled CRM that took months/years to build.

But turning a spreadsheet into an app in a few days? By giving the LLM proper read/write capabilities for velocity? I think the case is there for it. Right tool for the right job.

3form · 2026-04-26T17:04:54 1777223094

I think the argument would be mostly about the companies where such trivialities like proper auth were given up to maximum possible extent. I'm sure even some bigger ones are only gnashing their teeth over implementing security measures that are required by law and not seeing much point to it.

SegfaultSeagull · 2026-04-26T17:07:54 1777223274

This comment is savage and I’m here for it.

throw5 · 2026-04-26T17:23:37 1777224217

> There's a growing demand for single user or smaller scoped apps where giving LLM agents direct access means velocity. The failure/rollback model is much easier with these as long as we have good backup hygiene.

This makes no sense to me. For anything that has sensitive payment or personally identifieable data, direct access to DB is potentially illegal.

> The failure/rollback model is much easier with these as long as we have good backup hygiene.

Have you actually operated systems like this in production? Even reverting to a DB state that is only seconds old can still lose hundreds or thousands of transactions. Which means loads of unhappy customers. More realistically, recovery points are often minutes or hours behind once you factor in detection, validation and operational overhead.

DB revert is for exceptional disaster recovery scenarios, not something you want in normal day-to-day operations. If you are saying that you want to give LLM full access to prod DB and then revert every time it makes a mistake, you aren't running a serious business.

2ndorderthought · 2026-04-26T17:58:10 1777226290

You are thinking way too hard. This person is a hazard that needs to learn the hard way.

If velocity means letting agents live edit a db, I'm fine being slow. Holy hell. Let these people crash and burn but definitely let me know the app name so I know never to use it first.

cowlby · 2026-04-26T20:28:09 1777235289

Not everything is a SaaS. I commented this elsewhere but I picture all the business running on spreadsheets/CSVs/MS Access databases on someone's desktop. People delete these all the time by accident. They have no security, no authentication, etc.

An LLM agent (with RW access to a DB), a developer, and a few days these become proper apps that SMB business would pay well for.

Sure don't give an LLM agent access to PII or properly built CRMs etc. But to not see the rest of the landscape seems like a missed opportunity.

jmalicki · 2026-04-26T22:36:41 1777243001

At the very least you should give it a non-prod copy of the database, not direct access to the DB actively powering production right now.

I've done work for a hedge fund where the DB ran directly on the manager's desktop. I worked with my local copy and sent an update script, and he had a second copy he ran on to verify.

Even with humans you shouldn't be working directly against the prod DB in these cases!

cowlby · 2026-04-27T03:51:34 1777261894

Yes, I just think there's a sane way to do things that is not "never let LLM agents do things".

For dev/prod staging though, there's that other story on HN right now of an LLM agent that maneuvered it's way to prod credentials and destroyed prod. And backups went along with it. I'm paranoid enough to think backups in this use case means out-of-band uncorrelated storage.

2ndorderthought · 2026-04-26T22:51:00 1777243860

There is literally no excuse. The fact that there is any resistance to this let alone from multiple people terrifies me.

cowlby · 2026-04-27T03:45:41 1777261541

I just think there's more nuance to it. Some things have an implicit RTO/RPO/SLA of say a day. Risk is also correlated to recovery and rollback. And there's levels of LLMs out there.

Surely in the Venn Diagram of things, there's a slot where it's okay let a Claude Opus agent run on a process with good backups/recovery? Where taking the risk of a 1-hour restore job is worth the LLM agent velocity?

For extra paranoia, surely even Opus/Mythos can't figure out how to destroy log level backups to immutable storage.

2ndorderthought · 2026-04-27T12:58:21 1777294701

The only nuance I can see is, does the data matter at all? If it does you shouldn't do this. If it doesn't then who cares, also why even put it in a database.

steve_adams_86 · 2026-04-26T17:41:41 1777225301

This narrative seems to come from people who haven't worked on meaningfully complex software systems. They're more like script kiddies than software developers. I don't mean that in a derogatory manner. They're right that LLMs are unlocking new possibilities in the realm of their work. They just don't realize that these new possibilities are constrained to relatively simple applications, or very thin slices of complex systems.

I use an LLM to access my database occasionally, but never in production and never with write access. It is genuinely useful. It would never be useful in a production setting, though.

It's worth noting too that people should be wary of what a read only user means in database land. There are plenty of foot guns where writes can occur with read-like statements, and depending on the schema, maybe this would be a rollback-worthy situation. You really need to understand your database and schema before allowing an LLM anywhere near it, and you should be reviewing every query.

cowlby · 2026-04-26T20:25:55 1777235155

That's the issue that I feel misses the forest for the trees. Relatively simple applications or thin slices exist right now, in production, in critical paths, as spreadsheets/CSVs/files on someone's desktop. That's the pent up demand I picture out there for developers.

Go to any SMB out there and there's a goldmine of processes that could be improved with LLM agents with full RW access to a database. Where backups are sufficient as a recovery mechanism that is better-than-before.

sroussey · 2026-04-26T22:37:00 1777243020

I think the Venn diagram of people letting LLMs have complete control of their database AND having good backups, will have no overlap. The people that would benefit or not the people that have backups.

steve_adams_86 · 2026-04-26T23:48:09 1777247289

This is also a good point. Details like this are why I think experienced developers are going to remain relevant for a while yet. Anticipating what can go wrong is such a huge component of what building software systems is about. LLMs can be great at it, but only with the limited context they have, and even then only somewhat coincidentally.

steve_adams_86 · 2026-04-26T20:49:17 1777236557

Okay, totally agree. I think good harnesses are crucial but the premise is absolutely valid.

cowlby · 2026-04-26T20:30:18 1777235418

I'm not thinking of SaaS or properly built apps with an API, modeled databases, etc. I'm thinking spreadsheets/CSVs/MS Access that thousands of SMBs use to power their critical paths and someone accidentally deletes. Typically single user, maybe a small team. Infrequent writes, lots of reads.

gmueckl · 2026-04-26T17:02:49 1777222969

But are those users allowed to see all the data in the databawe by the law? Some privacy laws require that personal information must be hidden from employees unless they have a narrow and specific business reason to view it. Blanket full access to a database may be illegal for that reason.

argomo · 2026-04-26T19:56:54 1777233414

I think a lot of the objections to your post could be answered by reminding folks of how Microsoft Access databases tend to pop up in small businesses as well as corporate environments outside of IT departments. Yes, they're not "proper" databases but they /get business done/ and often serve as v0 before a real app can be properly conceived of.

One can easily imagine an LLM-enabled database that lets a wider audience build meat-and-potatoes line-of-business apps for small team use with minimal compliance concerns.

cowlby · 2026-04-26T20:22:22 1777234942

Yes, that's the right framing. Millions flow through spreadsheets/CSVs/MS Access with none of the auth/backups/architecture people seem to be stuck to.

I saw an article on HN one time about CSVs and how much business still flows through them. Reminds me of the xkcd comic about the one tiny block propping up lots of infrastructure. It stuck with me because it's ripe area for LLM agent based upgrades.

Sure don't give LLMs access to the well architected blocks. But not wanting to improve the brittle areas seems crazy to me even if it's contrarian.

raincole · 2026-04-26T17:33:26 1777224806

> single user

If you're just vibe coding a tool for yourself, you don't have 'production database' at all even if you use database technology for storage. Just like many Android apps use local sqlite DBs but they're not production databases.

Of course in this case no traditional wisdom about production databases matters to you. In other words, it's off-topic.

cowlby · 2026-04-26T20:18:34 1777234714

I commented this elsewhere: There's thousands of small and medium business though. They have maybe one true CRM, and a dozen spreadsheets/files floating around that would benefit becoming proper apps. People delete spreadsheets all the time!

Sure don't give an LLM agent write access to the modeled CRM that took months/years to build.

But turning a spreadsheet into an app in a few days? By giving the LLM proper read/write capabilities for velocity? I think the case is there for it. Right tool for the right job.

nophunphil · 2026-04-26T17:02:07 1777222927

1) Can you explain what demand and supply mean in this context?

2) In regards to having good backup hygiene, who is we?

cowlby · 2026-04-26T20:14:43 1777234483

I think of all the pent up demand for proper applications that are just infeasible when it would take a developer weeks-to-months to create. Now it's just a few days with an LLM agent.

Examples for me are all the apps that live in a spreadsheet, or in a MS Access database. Or all the crappy ad backed apps on the iOS app store. People wipe full spreadsheets all the time and backups are the only recovery.

Just last weekend I was frustrated with the poor quality of Pokedex type apps that spam ads left and right. Took just one session with Claude Opus to roll a custom Pokedex. It knew internally about things like the PokeApi dataset, Pokemon data modelling etc. To-the-hour snapshots of the database are trivial for bespoke apps like this so the LLM agent velocity seems like an okay trade off for me.

Clearly people don't agree...

dherls · 2026-03-29T00:52:13 1774745533

The author fails to mention any of the negative effects they experience due to this go version selection. They say that the effect is "viral" but don't give any concrete examples of why it's a bad thing to keep your toolchain up to date

bkdbkd · 2026-03-29T05:51:38 1774763498

It forces a change, where none is called for. Compatibility works both ways. What doesn't matter to me the lib dev, may for matter for someone else. The world is built on portable, flexible code, and pinning to something unnecessarily, breaks that one small part of the world. It's adding an unnecessary requirement. Life is hard enough.

PaulKeeble · 2026-03-29T01:02:57 1774746177

One of the key advantages of Go is its very compatible, you can compile and run early versioned code on the latest compiler without concern and it will just run with less bugs and faster due to all the advancements over time. I don't like being forced to upgrade my tooling until I choose the upgrade but in Go's case its usually trivial.

WhyNotHugo · 2026-03-29T02:23:16 1774750996

Anyone with an older toolchain can’t build that library of anything that depends on it.

Some environments might not even have the newer version available.

jmalicki · 2026-03-29T02:38:45 1774751925

Anyone with an older toolchain is free to fork it on github, test with the older version, and CI to the project that tests with the older version, and submit a patch, too!

This may not get the project as many users, but not everyone who writes a 50 line project is trying to figure out which versions it supports and setting up full test matrices either.

mid-kid · 2026-03-29T21:38:26 1774820306

Not a Go dev, but I typically set up a CI with the oldest toolchains I support (usually a debian release), and only bump those versions when I really need something from the latest versions. Locally I build with the most recent tools. This ensures good enough coverage for very little work, as I notice when I start using something that's newer and can bump the toolchain accordingly.

jmalicki · 2026-03-29T23:39:35 1774827575

Sure, but if you start a new small project and throw it on GitHub, it's not totally insane to just put the version you tested. Just because someone put up their tiny library doesn't mean they've put in the effort to figure out which version they need.

WhyNotHugo · 2026-03-30T05:55:50 1774850150

Are you sure you replied to the right comment? I'm not sure how this relates to the question being asked.

jmalicki · 2026-03-30T14:08:32 1774879712

I did.

If you have an older tool chain, it is on you to fix the library to build with the older tool chain, that's what open source is about!

WhyNotHugo · 2026-04-02T11:44:18 1775130258

Sorry, I have no idea what you're talking about. Please double check the message thread to which you're replying.

canpan · 2026-03-29T01:13:05 1774746785

I am missing this part too. I can't really say ever having a problem upgrading go to the latest version. Now with "go fix", a lot of features are even improved automatically.

dherls · 2026-01-13T22:54:24 1768344864

It's much easier to detect a single account abusing your API and ban them/require payment. Trying to police an endpoint open to the internet is like playing g whackamole

dherls · 2026-01-06T19:55:53 1767729353

Really impressive that it's implemented in < 400 lines of Javascript code and runs so smoothly in my phone's browser (Firefox on Android)

dherls · 2025-12-27T14:42:04 1766846524

I would definitely recommend not putting complex logic like this in your cron definitions. Much more annoying to find and debug in the future. I prefer to write a short wrapper script that contains the test logic instead and track/version control it

zbentley · 2025-12-27T18:05:18 1766858718

Good advice. You can also check in and version your crontabs (or timer units or whatnot) directly.

dherls · 2025-11-30T01:12:29 1764465149

Some of the alternatives that the author suggests (Slack, Discord, Matrix rooms) are so much worse to search for answers in. Stack overflow has many disadvantages, but it is extremely good at being a publicly searchable repository of answers to common questions

scuff3d · 2025-11-30T05:24:13 1764480253

Yeah this is the biggest problem with Discord (and similar platforms) as an alternative. Discoverablity sucks. Even if you are aware and have access, finding useful information is a bitch.

Suppafly · 2025-11-30T04:09:06 1764475746

>it is extremely good at being a publicly searchable repository of answers to common questions

Closed duplicate of <something that is totally different>

tommica · 2025-11-30T07:36:15 1764488175

Yeah, even with that it is still better

qaisjp · 2025-11-30T22:28:56 1764541736

The "closed duplicate" thing is blown away out of proportion than it actually is. I'm convinced people are just pooping on Stack Overflow, not because it's bad, but because they just like complaining about things.

Suppafly · 2025-12-03T04:23:35 1764735815

>The "closed duplicate" thing is blown away out of proportion than it actually is.

I hardly use the site and it's happened to me multiple times. I'm sure people that rely more heavily on it see it all the time. I suppose it depends on the sorts of topics you're looking for help with, much like wikipedia or subreddits, a lot of the little niches are seemingly ran by assholes that would rather delete stuff than actually help people find information.

mhh__ · 2025-11-30T01:54:49 1764467689

I believe pretty strongly that almost every company should have some kind of internal SE

davnicwil · 2025-11-30T02:59:53 1764471593

I agree, so much so that about 10 years ago I built a product that did this!

I launched to lukewarm reception, actually applied to YC with it and didn't get much of a look, nor an interview :-) and after a bit of (though certainly far too little) further hustle gave up on it due to circumstances leading me on another path.

Anyway, I was a tiny bit vindicated when about a year later I noticed Stack exchange themselves did a similar product, but as far as I know, it never really hit. They would advertise it in the side banner for quite a few years but it eventually seemed to go away.

It's weird that it didn't work, it always did seem like an incredibly good idea to me - just so good, it's obvious. If such a thing existed, it'd add so much to any company onboarding experience at a minimum, and would also have obvious ongoing value.

And it just seemed like a great strategy to get useful and up to date documentation: to gamify it. There's just an inherent incentive to become the 'Jon Skeet' of your organisation as it were, rather than making documentation this largely anonymous, thankless afterthought it often becomes in practice despite best intentions.

Aurornis · 2025-11-30T03:28:10 1764473290

If it's any consolation, I feel like I've been pitched 10 different versions of this product over the years and I've encountered a lot of startups trying to do the exact same thing. You probably didn't get any YC traction because they'd seen the same thing so many times before. I wouldn't be surprised if we could find a YC batch or two that already contained this exact idea.

davnicwil · 2025-11-30T04:23:35 1764476615

Thanks, and this might be it - it's even more obvious than I ever realised!

DougN7 · 2025-11-30T04:55:35 1764478535

I wish your solution was still out there. I’m still running an ancient OSQA site with no way to migrate to anything else.

arccy · 2025-11-30T09:21:54 1764494514

consider feedback people get like https://news.ycombinator.com/item?id=46086703

companies don't always reward answering questions...

hombre_fatal · 2025-11-30T02:33:41 1764470021

The problem that comes to mind is that every question and answer that’s posted is something you have to maintain as part of your docs as they rot over time.

I’d be curious to hear what the common solutions to that are.

Maybe it can be used as a limbo to gather FAQs that get crystallized into the real docs and then deleted.

klodolph · 2025-11-30T02:57:58 1764471478

I think a reasonable solution is “people who find the answer should observe that the question was asked eight years ago, and certainly double-check the answer”. If it’s a question about company internal codebases or operations, then you should have access to see the code or resources the answer is talking about.

PunchyHamster · 2025-11-30T02:45:29 1764470729

Yeah, our wiki is full of old no longer actual/relevant articles and very little incentive to fix any of that vs go work on the next ticket.

I even pondered adding a bot that would create ticket out of oldest not-updated article for someone to go thru and verify it's still current/relevant

begueradj · 2025-11-30T02:13:41 1764468821

Good documentation, communication channels and a healthy work environment where colleagues can communicate and help each other are much beneficial than an internal SE.

hshdhdhj4444 · 2025-11-30T02:21:06 1764469266

Documentation doesn’t solve the problem of Q/A situations.

Internal Stack Exchanfes are (were? I’m not sure whether they discontinued it…my old company had one but new one doesn’t) is really good at converting chat style communication into a permanent easily searchable record that can also be easily updated.

You can also avoid many of the pitfalls with stack overflow around over aggressive moderation (not really needed since the volume of questions won’t be as high), or inappropriate commenting of any sort (reach out to the individual directly or even to their manager), etc.

cellio · 2025-11-30T02:48:30 1764470910

They still sell Stack Overflow for Teams (renamed Stack Overflow Internal or something like that), but the cost is pretty astronomical. If you want private Q&A in your company/school/etc and you've got anybody with the tech clues, you're better off downloading and setting up one of the free tools.

shawn_w · 2025-11-30T04:15:46 1764476146

SE sells just such a thing. I think it's where a lot of their income comes from.

danielheath · 2025-11-30T02:21:09 1764469269

The only suitable bit of tech that comes to my mind is Lemmy (the reddit-style activitypub thing).

fijiaarone · 2025-11-30T02:24:21 1764469461

And we wouldn’t have a use case for Ai without StackOverflow & Google’s broken search.

dherls · 2025-11-17T00:54:40 1763340880

This blog post talks as if mocking the `open` function is a good thing that people should be told how to do. If you are mocking anything in the standard library your code is probably structured poorly.

In the example the author walks through, a cleaner way would be to have the second function take the Options as a parameter and decouple those two functions. You can then test both in isolation.

sunrunner · 2025-11-17T10:39:39 1763375979

> If you are mocking anything in the standard library your code is probably structured poorly.

I like Hynek Schlawak's 'Don’t Mock What You Don’t Own' [1] phrasing, and while I'm not a fan of adding too many layers of abstraction to an application that hasn't proved that it needs them, the one structure I find consistently useful is to add a very thin layer over parts that do I/O, converting to/from types that you own to whatever is needed for the actual thing.

These layers should be boring and narrow (for example, never mock past validation you depend upon), doing as little conversion as possible. You can also rephrase the general purpose open()-type usage into application/purpose-specific usages of that.

Then you can either unittest.mock.patch these or provide alternate stub implementations for tests in a different way, with this this approach also translating easily to other languages that don't have the (double-edged sword) flexibility of Python's own unittest.mock.

[1] https://hynek.me/articles/what-to-mock-in-5-mins/

1718627440 · 2025-11-17T10:12:48 1763374368

> This blog post talks as if mocking the `open` function is a good thing that people should be told how to do. If you are mocking anything in the standard library your code is probably structured poorly.

Valgrind is a mock of standard library/OS functions and I think its existence is a good thing. Simulating OOM is also only possible by mocking stuff like open.

paulf38 · 2025-11-17T15:36:04 1763393764

> Valgrind is a mock of standard library/OS functions and I think its existence is a good thing.

That is mostly wrong.

Valgrind wraps syscalls. For the most part it just checks the arguments and records any reads or writes to memory. For a small number of syscalls it replaces the syscall rather than wrapping it (for instance calls like getcontext where it needs to get the context from the VEX synthetic CPU rather than the real CPU).

Depending on the tool it can also wrap or replace libc and libpthread functions. memcheck will replace all allocation functions. DRD and Helgrind wrap all pthread functions.

1718627440 · 2025-11-17T16:14:21 1763396061

    $ cat test.c
    void main (void) {
      malloc (1000);
    }
    
    $ make test
    cc     test.c   -o test
    
    $ valgrind --leak-check=full --show-leak-kinds=all -s ./test
    Memcheck, a memory error detector
    Command: ./test
    
    HEAP SUMMARY:
        in use at exit: 1,000 bytes in 1 blocks
      total heap usage: 1 allocs, 0 frees, 1,000 bytes allocated
    
    1,000 bytes in 1 blocks are still reachable in loss record 1 of 1
       at 0x483877F: malloc (vg_replace_malloc.c:307)
       by 0x109142: main (in test.c:2)
    
    LEAK SUMMARY:
       definitely lost: 0 bytes in 0 blocks
       indirectly lost: 0 bytes in 0 blocks
         possibly lost: 0 bytes in 0 blocks
       still reachable: 1,000 bytes in 1 blocks
            suppressed: 0 bytes in 0 blocks

> vg_replace_malloc.c:307

What do you think that is? Valgrind tracks allocations by providing other implementations for malloc/free/... .

paulf38 · 2025-11-18T05:58:46 1763445526

Are you trying to explain to me how Valgrind works? If you do know more than me then please join us and become a Valgrind developer.

Mostly it wraps system calls and library calls. Wrapping means that it does some checking or recording before and maybe after the call. Very occasionally it needs to modify the arguments to the call. The rest of the time it passes the arguments on to the kernel or libc/libpthread/C++ lib.

There are also functions and syscalls that it needs to replace. That needs to be a fully functional replacement, not just looking the same as in mocking.

I don’t have any exact figures. The number of syscalls varies quite a lot by platform and on most platforms there are many obsolete syscalls that are not implemented. At a rough guess, I’d say there are something like 300 syscalls and 100 lib calls that are handled of which 3/4 are wrapped and 1/4 are replaced.

1718627440 · 2025-11-18T12:58:27 1763470707

> Are you trying to explain to me how Valgrind works?

Sorry that wasn't my intention. You are a Valgrind developer? Thanks, it's a good project.

It seems like I have a different understanding of mocking than other people in the thread and it shows. My understanding was, that Valgrind provides function replacements via dynamic linking, that then call into the real libc. I would call that mocking, but YMMV.

vkou · 2025-11-17T10:50:10 1763376610

All rules exist to be broken in the right circumstances. But in 99.9% of test code, there's no reason to do any of that.

1718627440 · 2025-11-17T11:13:38 1763378018

I think when testing code with an open call, it is a good idea to test what happens on different return values of open. If that is not what you intent to test for this method, then that method shouldn't contain open at all, as already pointed out by other comments.

vkou · 2025-11-17T17:47:28 1763401648

That depends on what your error recovery plan is.

If the code's running in a space shuttle, you probably want to test that path.

If it's bootstrapping a replicated service, it's likely desirable to crash early if a config file couldn't be opened.

If it's plausible that the file in question is missing, you can absolutely test that code path, without mocking open.

If you want to explicitly handle different reasons for why opening a file failed differently, by all means, stress all of that in your tests. But if all you have is a happy path and an unhappy path, where your code doesn't care why opening a file failed, all you need to test is the case where the file is present, and one where it is not.

1718627440 · 2025-11-18T00:02:54 1763424174

Modifying the file system to be would be kind of like mocking to me. I very much, don't want my daemons or user-facing applications to just crash, when a file is missing. That's kind-of the worst thing you can do.

vkou · 2025-11-18T05:07:36 1763442456

> Modifying the file system to be would be kind of like mocking to me.

Modifying the file system's implementation would be. Including a valid_testdata.txt and an invalid_testdata.txt file in your test's directory, however, is not 'modifying the file system', any more than declaring a test input variable is 'mocking memory access'.

> don't want my daemons or user-facing applications to just crash, when a file is missing

If the file is important, it's the best kind of thing you can do when implementing a non-user-facing service. The last thing you want to do is to silently and incorrectly serve traffic because you are missing configuration.

You want to crash quickly and let whatever monitoring system you have in place escalate the problem in an application-agnostic manner.

bluGill · 2025-11-17T01:16:52 1763342212

Details matters, but good test doubles here are important. You want to capture all calls to IO and do something different. You don't want tests to break because someone has a different filesystem, didn't set their home directory as you want it setup, or worse is trying to run two different tests at the same time and the other test is changing files the other wants.

Note that I said test doubles. Mocks are a bit over specific - they are about verifying functions are called at the right time with the right arguments, but the easy ability to set return values makes it easy to abuse them for other things (this abuse is good, but it is still abuse of the intent).

In this case you want a fake: a smart service that when you are in a test setups a temporary directory tree that contains all the files you need in the state that particular test needs, and destroys that when the test is done (with an optional mode to keep it - useful if a test fails to see debug). Depending on your situation you may need something for network services, time, or other such things. Note that in most cases a filesystem itself is more than fast enough to use in tests, but you need isolation from other tests. There are a number of ways to create this fake, it could override open, or it could just be a GetMyProgramDir function that you override are two that I can think of.

jpollock · 2025-11-17T02:45:20 1763347520

Your tests are either hermetic, or they're flaky.

That means the test environment needs to be defined and versioned with the code.

dherls · 2025-11-17T02:56:38 1763348198

Even in the case you mention you really shouldn't be overriding these methods. Your load settings method should take the path of the settings file as an argument, and then your test can set up all the fake files you want with something like python's tempfile package

bluGill · 2025-11-17T14:07:19 1763388439

There are a number of different ways to solve this problem. I too use the path of settings in my code, but I'm not against overriding open and all the other file io functions. Of course this article is about python which has different abilities than other languages, what is best in python is not what is best in other languages, and I'm trying to stay at a higher level that a particular language.

vkou · 2025-11-17T10:49:06 1763376546

> This blog post talks as if mocking the `open` function is a good thing that people should be told how to do.

It does. And this is exactly the problem, here!

> TFA: The thing we want to avoid is opening a real file

No! No, no, no! You do not 'want to avoid opening a real file' in a test.

It's completely fine to open a real file in a test! If your code depends on reading input files, then your test should include real input files in it! There's no reason to mock any of this. All of this stuff is easy to set up in any unit test library worth it's salt.

9rx · 2025-11-17T15:47:02 1763394422

> then your test should include real input files in it! There's no reason to mock any of this.

That's okay for testing some branches of your code. But not all. I don't want to have to actually crash my hard drive to test that I am properly handling hard drive crashes. Mocking[1] is the easiest way to do that.

[1] For some definition of mock. There is absolutely no agreement found in this space as to what the terms used mean.

dherls · 2025-11-15T21:16:06 1763241366

I like how the article uses "Googling" as a verb meaning to shut down a service

oytis · 2025-11-15T21:44:25 1763243065

Thank you, I failed to understand what he means.

dherls · 2025-10-26T19:25:44 1761506744

I think the sandwich demo is really good. Once you establish the sandwich idea you can start zooming out to OK now you have a cook making multiple sandwiches, now you have a whole kitchen, and use that to talk about levels of abstraction and how SWEs go from solving one specific problem to more general problems by reusing techniques