My two cents is LLMs are way stronger in areas where the reward function is well known, such as exploiting - you break the security, you succeed.
It's much harder to establish whats a usable and well architected, novel piece of software, thus in that area, progress isn't nearly as fast, while here you can just gradient descent your way to world domination, provided you have enough GPUs.
offense has a clear reward function, but so does detection when you frame it right. "did this process try to read ~/.ssh/id_rsa?" is just as binary as "did the exploit land?" the reason defense feels harder is that people frame it as architecture review (fuzzy, subjective) instead of policy enforcement (binary, automatable). we keep trying to make AI understand intent when we should be writing rules about actions. a confused deputy from 1988 doesn't care why the request came in, it cares whether the caller is authorized. same principle applies here.
I don't know the first thing about cybersecurity, but in my experience all these sandbox-break RCEs involve a step of highjacking the control flow.
There were attempts to prevent various flavors of this, but imo, as long as dynamic branches exist in some form, like dlsym(), function pointers, or vtables, we will not be rid of this class of exploit entirely.
The latter one is the most concerning, as this kind of dynamic branching is the bread and butter of OOP languages, I'm not even sure you could write a nontrivial C++ program without it. Maybe Rust would be a help here? Could one practically write a large Rust program without any sort of branch to dynamic addresses? Static linking, and compile time polymorphism only?
I think most vulnerabilities are in crappy enterprise software. TOCTOU stuff in the crappy microservice cloud app handling patient records at your hospital, shitty auth at a webshop, that sort of stuff.
A lot of these stuff is vulnerable by design - customer wanted a feature, but engineering couldnt make it work securely with the current architecture - so they opened a tiny hole here and there, hopefully nobody will notice it, and everyone went home when the clock struck 5.
I'm sure most of us know about these kinds of vulnerabilities (and the culture that produces them).
Before LLMs, people needed to invest time and effort into hacking these. But now, you can just build an automated vuln scanner and scan half the internet provided you have enough compute.
I think there will be major SHTF situations coming from this.
Yeah. Crufty cobbled together enterprise stuff will suffer some of the worst. But this will be a great opportunity for the enterprise software services economy! lol.
I honestly see some sort of automated whole codebase auditing and refactoring being the next big milestone along the chatbot -> claude code / codex / aider -> multi-agent frameworks line of development. If one of the big AI corps cracks that problem then all this goes away with the click of a button and exchange of some silver.
Just reading this, the inevitable scaremongering about biological weapons comes up.
Since most of us here are devs, we understand that software engineering capabilities can be used for good or bad - mostly good, in practice.
I think this should not be different for biology.
I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?
Do you think these models will lead to similar discoveries and improvements as they did in math and CS?
Honestly the focus on gloom and doom does not sit well with me. I would love to read about some pharmaceutical researcher gushing about how they cut the time to market - for real - with these models by 90% on a new cancer treatment.
But as this stands, the usage of biology as merely a scaremongering vehicle makes me think this is more about picking a scary technical subject the likely audience of this doc is not familiar with, Gell-Mann style.
IF these models are not that capable in this regard (which I suspect), this fearmongering approach will likely lead to never developing these capabilities to an useful degree, meaning life sciences won't benefit from this as much as it could.
> I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?
Well, I would say they have done precisely that in evaluating the model, no? For example section 2.2.5.1:
>Uplift and feasibility results
>The median expert assessed the model as a force-multiplier that saves meaningful time
(uplift level 2 of 4), with only two biology experts rating it comparable to consulting a
knowledgeable specialist (level 3). No expert assigned the highest rating. Most experts were
able to iterate with the model toward a plan they judged as having only narrow gaps, but
feasibility scores reflected that substantial outside expertise remained necessary to close
them.
You said: "I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?" and they said, paraphrasing, "We reached out and talked to biologists and asked them to rank the model between 0 and 4 where 4 is a world expert, and the median people said it was a 2, which was that it helped them save time in the way a capable colleague would" specifically "Specific, actionable info; saves expert meaningful time; fills gaps in adjacent domains"
so I'm just telling you they did the thing you said you wanted.
Yes that is correct. I would like a large body of experience and consenus to rely on as opposed to the regular 'trust the experts' argument, which has been shown for decades that is a deeply flawed and easy to manipulate argument.
> Yes that is correct. I would like a large body of experience and consenus to rely on as opposed to the regular 'trust the experts' argument, which has been shown for decades that is a deeply flawed and easy to manipulate argument.
Yes, it is far inferior to the 'Trust torginus and his ability to understand the large body of experience that other actual subject-matter-experts have somehow not understood' strategy
It's not my credibility I want to measure against Anthropic's. I just said to apply the same logic to biology you would apply for software development.
The parallels here are quite remarkable imo, but defer to your own judgement on what you make of them.
The big thing you're missing here is that biology people don't (in my experience) post opinions about the future/futility/ease/unimportance of computer science especially when their opinion goes against other biologists' evidence-backed views. This is a cultural thing in biology.
It's not your fault that you don't know this, but this whole subthread is very CS-coded in its disdain for other software people's standard of evidence.
> Just reading this, the inevitable scaremongering about biological weapons comes up.
It's very easy to learn more about this if it's seriously a question you have.
I don't quite follow why you think that you are so much more thoughtful than Anthropic/OpenAI/Google such that you agree that LLMs can't autonomously create very bad things but—in this area that is not your domain of expertise—you disagree and insist that LLMs cannot create damaging things autonomously in biology.
I will be charitable and reframe your question for you: is outputting a sequence of tokens, let's call them characters, by LLM dangerous? Clearly not, we have to figure out what interpreter is being used, download runtimes etc.
Is outputting a sequence of tokens, let's call them DNA bases, by LLM dangerous? What if we call them RNA bases? Amino acids? What if we're able to send our token output to a machine that automatically synthesizes the relevant molecules?
>It's very easy to learn more about this if it's seriously a question you have.
No, it's not. It took years of polishing by software engineers, who understand this exact profession to get models where they are now.
Despite that, most engineers were of the opinion, that these models were kinda mid at coding, up until recently, despite these models far outperforming humans in stuff like competitive programming.
Yet despite that, we've seen claims going back to GPT4 of a DANGEROUS SUPERINTELLIGENCE.
I would apply this framework to biology - this time, expert effort, and millions of GPU hours and a giant corpus that is open source clearly has not been involved in biology.
My guess is that this model is kinda o1-ish level maybe when it comes to biology? If biology is analogous to CS, it has a LONG way to go before the median researcher finds it particularly useful, let alone dangerous.
>>It's very easy to learn more about this if it's seriously a question you have.
>No, it's not. It took years of polishing by software engineers, who understand this exact profession to get models where they are now
This reads as defensive. The thing that is easy to learn is 'why are biology ai LLMs dangerous chatgpt claude'. I have never googled this before, so I'll do this with the reader, live. I'm applying a date cutoff of 12/31/24 by the way.
Here, dear reader, are the first five links. I wish I were lying about this:
I don't know about you, but that counts as easy to me.
-----
> I would apply this framework to biology - this time, expert effort, and millions of GPU hours and a giant corpus that is open source clearly has not been involved in biology.
I've been getting good programming and molecular biology results out of these back to GPT3.5.
I don't know what to tell you—if you really wanted to understand the importance, you'd know already.
I feel somebody better qualified should write a comprehensive review of how these models can be used in biology. In the meantime, here are my two cents:
- the models help to retrieve information faster, but one must be careful with hallucinations.
- they don't circumvent the need for a well-equipped lab.
- in the same way, they are generally capable but until we get the robots and a more reliable interface between model and real world, one needs human feet (and hands) in the lab.
Where I hope these models will revolutionize things is in software development for biology. If one could go two levels up in the complexity and utility ladder for simulation and flow orchestration, many good things would come from it. Here is an oversimplified example of a prompt: "use all published information about the workings of the EBV virus and human cells, and create a compartimentalized model of biochemical interactions in cells expressing latency III in the NES cancer of this patient. Then use that code to simulate different therapy regimes. Ground your simulations with the results of these marker tests." There would be a zillion more steps to create an actual personalized therapy but a well-grounded LLM could help in most them. Also, cancer treatment could get an immediate boost even without new drugs by simply offloading work from overworked (and often terminally depressed) oncologists.
From what I've heard from people doing biology experiments, the limiting factor there is cleaning lab equipment, physically setting things up, waiting for things that need to be waited for etc. Until we get dark robots that can do these things 24/7 without exhaustion, biology acceleration will be further behind than software engineering.
Software engineering is at the intersection of being heavy on manipulating information and lightly-regulated. There's no other industry of this kind that I can think of.
There is a massive gap between "having a recipe" and being able to execute it. The same reason why buying a Michelin 3 star chefs cookbook won't have you pumping out fine dining tomorrow, if ever.
Software it a total 180 in this regard. Have a master black hats secret exploits? You are now the master black hat.
Dario (the founder) has a phd in biophysics, so I assume that’s why they mention biological weapons so much - it’s probably one of the things he fears the most?
Going off the recent biography of Demis Hassabis (CEO/co-founder of Deepmind, jointly won the Nobel Prize in Chemistry) it seems like he's very concerned about it as well
Surely more than 10% of the time consumed by going to market with a cancer treatment is giving it to living organisms and waiting to see what happens, which can't be made any faster with software. That's not to say speedups can't happen, but 90% can't happen.
Not that that justifies doom and gloom, but there is a pretty inescapable assymetry here between weaponry and medicine. You can manufacture and blast every conceivable candidate weapon molecule at a target population since you're inherently breaking the law anyway and don't lose much if nothing you try actually works.
Though I still wonder how much of this worry is sci-fi scenarios imagined by the underinformed. I'm not an expert by any means, but surely there are plenty of biochemical weapons already known that can achieve enormous rates of mass death pleasing to even the most ambitious terrorist. The bottleneck to deployment isn't discovering new weapons so much as manufacturing them without being caught or accidentally killing yourself first.
It is easier to destroy than it is to protect or fix, as a general rule of the universe. I would not feel so confident about the speed of the testing loop keeping things in check.
I don't like it - you're forced to pass around this token, constantly manage the lifecycle of cancellation sources - and incredibly bug prone thing in async context, and it quickly gets very confusing when you have multiple tokens/sources.
I understand why they did it - a promise essentially is just some code, and a callback that will be triggered by someone at some point in time - you obviously get no quality of service promises on what happens if you cancel a promise, unless you as a dev take care to offer some.
It's also obvious that some operations are not necessarily designed to be cancellable - imagine a 'delete user' request - you cancelled it, now do you still have a user? Maybe, maybe you have some cruft lying around.
But still, other than the obvious wrong solution - C# had a Thread.Abort() similar to the stop() function that you mentioned, that was basically excommunicated from .NET more then a decade ago, I'm still not happy with the right one.
Not that rare in my experience, I constantly had to write software like this. Not every day, but it certainly did come up quite often in my code and others'
Oh and oone more thing - the very (developer-managed) complexity makes it that people constantly got it wrong, usually just enough (as often with the case of threading) that it worked fine 90% of the time, and was very hard to make a case to management why we should invest effort into fixing it.
Cancelling a token doesn't immediately abort the underlying Task. It is up to the implementation of that task to poll the token and actively decide when to abort.
In your example, you'd design your delete task such that if you want it to be cancelable, it can only be canceled before data is modified. You simply don't abort in the middle of a database transaction.
Moreover, because of the way cancellation tokens work, you can't abort blocking function calls unless you also pass the token along. There just isn't a mechanism that can interrupt a long IO operation or whatever unless you explicitly go to the effort to make that happen.
A cancellation token is more of a "pretty please stop what you're doing when you feel like it" concept than Thread.Abort().
this might be just my impression, but I feel like most people are using CC for fixing their React frontends, and they prefer the decreased latency and less tokens spent as opposed to performing well on extremely difficult problems?
That said there's still an issue of regression to the mean. What the average person likes, as determined by metrics, is something nobody actuallt likes, because the average is a mathematical construct and might not describe any particular individual accurately.
It's a huge issue of ARM based systems, that hardly anyone uses or tests things on them (in production).
Yes, Macs going ARM has been a huge boon, but I've also seen crazy regressions on AWS Graviton (compared to how its supposed to perform), on .NET (and node as well), which frankly I have no expertise or time digging into.
Which was the main reason we ultimately cancelled our migration.
I'm sure this is the same reason why its important to AWS.
Macs are actually part of pain point with ARM64 Linux, because the Linux arm set er tend to use 64 kB pages while Mac supports only 4 and 16, and it causes non trivial bugs at times (funnily enough, I first encountered that in a database company...)
Ubuntu is still pushing snap - they still kept the practice of silently replacing apt packages with snaps, I think the default Firefox is still a snap, and so is node.
The Ubuntu defaultism still puzzles me to this day... Canonical has been shown to subject users to its horrible science experiments pushing broken software on its users sometimes even persisting for half a decade or more (see pulseaudio, it was shipped in ubuntu for literal years, and it never worked...). Snap is their latest science experiment.
Though Im not sure what should be the default, as I can think of disadvantages to several alternatives.
Finally - I think the biggest issue of Linux today is the inability to ship a binary and have it just work across distros.
While there was - an unfortunately failed - push for having ABI compatibility (remember Linux Standard Base?), this has been an issue since Linux has existed
And in customary Linux fashion we had 3 solutions for this in Linux-land, snap which was the ubuntu solution that was slow and buggy - and forced on users in a customary ubuntu fashion way before it was ready, AppImage, which was very rudimentary and involved shipping half the userland, and Flatpak, which seemed to be the best engineered (but far from flawless) of the 3.
And in customary Linux fashion, users decided to just wait this one out.
I think it's great that Valve has taken the time and money to get Flatpak across the finish line.
Btw another thing about Valve - it's really great that they could've went their own way and reimplemented huge chunks of the Linux stack rather than going with what's there, and the associated communities and politics (I'm mainly referring to Wayland, and now Flatpak), but they've decided to go for the popular move and actually bring the existing infrastructure up to a commercial standard.
I wish you were joking, but one of the giant companies we work with as a supplier suggested we switch to the Windows version of their (desktop) software, running it under Wine.
In fact they told us they have plans to discountinue the true native Linux version, and going forward. they're going to package their Windows version with their version of Wine as the 'native' Linux solution.
This is a company that has both tremendous resources and deep Linux expertise.
I've been wondering if a linux GUI applucation can be made by compiling wine libraries into a linux ELF executable, skipping the EXE format. Do I still need the wine supporting extra processes or is this shippable?
I know I'm probably going to get some shit for this, but, this actually is one of the reasons I like using Rust. I know, I know, but, the fact that cargo can be used as a package manager universally across distros (and operating systems!) is a pretty huge boon to me as a developer.
Zero (Linux) package manager involvement and onerous rules.
I mean it's a general sign of the times across all of computing that problems keep getting solved wrong at all levels of the stack, and since the low level implementation can't be relied on for some reason, an implementation gets stacked on top.
More specifically to your case, Linux package management is an unmitigated disaster when it comes to development. Having to have root access just to install a few headers of whatever version your distro happens to ship with, have some scripts discover said versions (too bad if they are not the ones you wanted).
Every single professional (for profit model) piece of software tends to carry half the userland with it. Steam, Spotify etc..
Besides, Rust isn't big on the concept of dynamic libraries anyways, which once again, I don't think is a purely good thing, but there are a lot of arguments can be made pro or contra.
Let's just say it's a devil we know, which is more than can be said about a lot of other approaches.
It's much harder to establish whats a usable and well architected, novel piece of software, thus in that area, progress isn't nearly as fast, while here you can just gradient descent your way to world domination, provided you have enough GPUs.
reply