Learned once the hard way that it makes sense to use "flock" to prevent overlapping executions of frequently running jobs. Server started to slow down, my monitoring jobs started piling, causing server to slow down even more.
Sometimes certain containerized processes need to run according to a schedule, but maintainers also need a way to run them manually without the scheduled processing running or starting concurrently. A shared FS seems like the ”simplest thing that could possibly work” distribution method for locks intended for that purpose, but unfortunately not all cloud storage volumes are strongly consistent, even to the same user, and may take several ms for the lock to take hold.
Wouldn't a database give you better consistency guarantees in that case? NFS locking semantics are a lot more complicated than just a `SELECT .. FOR UPDATE`
Sure, but that would require a separate database for this one use case. Mixing infra concerns into an app db doesn’t sound kosher, either, and a shared volume is already available.
Seems easier to have a managed lockfile for each process, diligently checking that the lock has actually been acquired. Performance is not a concern anyway, as long as acquire takes just a few ms we’re golden.
If a file system implements lock/unlock functions precisely to the spec, it should be fully consistent for the file/directory that is being locked. Does not matter if the file system is local or remote.
In other words, it's not the author's problem. It's the problem of a particular storage that may decide to throw the spec out of the window. But even in an eventually consistent file system, the manufacturer is better off ensuring that the locking semantics is fully consistent as per the spec.
AI companies could never make any money (statement about the future, and about AI companies, and finances). And AI could be having a visible effect on hiring today (statement about now, and about non-AI companies, and about employment).
They don't have to both be true, but they do not inherently contradict each other.
Try to ask it to write some GLSL shaders. Just describe what you want to see and then try to run the shaders it outputs. It can output a UV-map or the simple gradient right, but when it comes to shaders a bit more complex it most of the time will not compile or run properly, sometimes mix GLSL versions, sometimes just straight make up things which don't work or output what you want.
Library/API conflicts are the biggest pain point for me usually. Especially breaking changes. RLlib (currently 2.41.0) and Gymnasium (currently 0.29.0+) have ended in circles many times for me because they tend to be out of sync (for multi-agent environments).
My go to test now is a simple hello world type card game like war, competitive multi-agent with rllib and gymnasium (pettingzoo tends to cause even more issues).
Claude Sonnet 4.5 was able to figure out a way to resolve it eventually (around 7 fixes) and I let it create an rllib.md with all the fixes and pitfalls and am curious if feeding this file to the next experiment will lead to a one-shot. GPT-5 struggled more but haven't tried Codex on this yet so it's not exactly fair.
All done with Copilot in agent mode, just prompting, no specs or anything.
I posted this example before but academic papers on algorithms often have pseudo code but no actual code.
I thought it would be handy to use AI to make the code from the paper so a few months ago I tried to use Claude (not GPT, because I only have access to Claude) to recreate C++ code to implement the algorithms in this paper as practice for me in LLM use and it didn’t go well.
I just tried it with GPT-5.1-Codex. The compression ratio is not amazing, so not sure if it really worked, but at least it ran without errors.
A few ideas how to make it work for you:
1. You gave a link to a PDF, but you did not describe how you provided the content of the PDF to the model. It might only have read the text with something like pdftotext, which for this PDF results in a garbled mess. It is safer to convert the pages to PNG (e.g. with pdftoppm) and let the model read it from the pages. A prompt like "Transcribe these pages as markdown." should be sufficient. If you can not see what the model did, there is a chance it made things up.
2. You used C++, but Python is much easier to write. You can tell the model to translate the code to C++ once it works in Python.
3. Tell the model to write unit tests to verify that the individual components work as intended.
4. Use Agent Mode and tell the model to print something and to judge whether the output is sensible, so it can debug the code.
Completely failed for me running the code it changed in a docker container i keep running. Claude did it flawlessly.
It absolutely rocks at code reviews but ir‘s terrible in comparison generating code
It really depends on what kind of code. I've found it incredible for frontend dev, and for scripts. It falls apart in more complex projects and monorepos
If OpenAI manages to get the agentic buying going, that could be big. They could tie the ad bidding to the user actually making the purchase, instead of just paying for clicks.
Add OpenAI Codex extension to VScode. Setup the ExecPlan instructions as described in [1].
Then start by writing a spec.md file where you describe what should be built. Write like you would write to a smart developer.
Then use the highest thinking model available with prompt "Create ExecPlan for the task @spec.md and write it to file". It will think a while and create the file.
Take a quick look at the generated file. It may have some open questions or surprises you want to review and write some answers to.
For the implemenatation I usually switch to medium. Then request with something like "Implement @execplan.md". If it has numbered steps, it seems to help to say "Impelement steps 1,2,3,4,5 and 6 on @execplan.md" - this way the agent is more likely to complete the whole plan in one pass.
Gut feeling is that at least about a month ago OpenAI Codex was better at building complete features than Claude Code. The ExecPlan trick made it work independently for longer periods.
I haven't benchmarked different tools against each other in serious manner.
A hybrid will likely emerge. I work on a chat application and it's pretty normal that LLM can print custom ui as part of the chat. Things like sliders, dials, selects, calendars are just better as a GUI in certain situations.
I've once saw a demo of an AI photo editing app that displays sliders next to light sources on a photo and you are able to dim/brighten the individual light sources intensity this way. This feels to me like a next level of the user interface.
1. There's a "normal" interface or query-language for searching.
2. The LLM suggests a query, based on what you said you wanted in English, possibly in conjunction with results of a prior submit.
3. The true query is not hidden from the user, but is made available so that humans can notice errors, fix deficiencies, and naturally--if they use it enough--learn how it works so that the LLM is no longer required.
Yessss! This is what I want. If there is a natural set of filters that can be applied, let me speak it in natural language, then the LLM can translate that as good as possible and then I can review it. E.g. searching photos between X and Y date, containing human Z, at location W. These are all filters that can be presented as separate UI elements so I can confirm the LLM interpreted correctly and I can adjust the dates or what have you without having to repeat the whole sentence again.
Also, any additional LLM magic would be a separate layer with its own context, safely abstracted beneath the filter/search language. Not a post-processing step by some kind of LLM-shell.
For example, "Find me all pictures since Tuesday with pets" might become:
Then the implementation of "fuzzy-content" would generate a text-description of the photo and some other LLM-thingy does the hidden document-building like:
Description: "black dog catching a frisbee"
Does that "with pets"?
Answer Yes or No.
Yes.
It seems to be a new take on text based adventure games. You tell what to do, then the game computes the results and shows them as short video clip. Then you make your next moves and so on.
reply