how are you going to get "adversarial attacks" with prompt injection. If you don...

simonw · 2025-09-30T16:10:37 1759248637

If you don't fetch data from external sources then you're safe from prompt injection.

But that's a very big if. I've seen Claude Code attempt to debug a JavaScript issue by running curl against the jsdelivr URL for a dependency it's using. A supply chain attack against NPM (and those aren't exactly rare these days) could add comments to code like that which could trigger attacks.

Ever run Claude Code in a folder that has a downloaded PDF from somewhere? There are a ton of tricks for hiding invisible malicious instructions in PDFs.

I run Claude Code and Codex CLI in YOLO mode sometimes despite this risk because I'm basically crossing my fingers that a malicious attack won't slip in, but I know that's a bad idea and that at some point in the future these attacks will be common enough for the risk to no longer be worth it.

mehdibl · 2025-09-30T16:29:11 1759249751

This is quite convoluted. Not seen in the wild and comments don't trigger prompt injection that easily.

Again you likely use vscode. Are you checking each extension you download? There is already a lot of reported attacks using vscode.

A lot of noise over MCP or tools hypothetical attacks. The attack surface is very narrow, vs what we already run before reaching Claude Code.

Yes Claude Code use curl and I find it quite annoying we can't shut the internal tools to replace them with MCP's that have filters, for better logging & ability to proxy/block action with more in depth analysis.

simonw · 2025-09-30T17:15:34 1759252534

I know it's not been seen in the wild, which is why it's hard to convince people to take it seriously.

Maybe it will never happen? I find that extremely unlikely though. I think the reason it hasn't happened yet is that widespread use of agentic coding tools only really took off this year (Claude Code was born in February).

I expect there's going to be a nasty shock to the programming community at some point once bad actors figure out how easy it is to steal important credentials by seeding different sources with well crafted malicious attacks.

some_furry · 2025-09-30T15:36:52 1759246612

> how are you going to get "adversarial attacks" with prompt injection

Lots of ways his could happen. To name two: Third-party software dependencies, HTTP requests for documentation (if your agent queries the Internet for information).

If you don't believe me, setup a MITM proxy to watch network requests and ask your AI agent to implement PASETO in your favorite programming language, and see if it queries https://github.com/paseto-standard/paseto-spec at all.

mehdibl · 2025-09-30T16:31:49 1759249909

This is a vendor selling a solution for "hypothecal" risk not seen in the WILD!

More seen as buzz article about how it could happen. This is very complicated to exploit vs classic supply chains and very narrow!

some_furry · 2025-09-30T16:34:35 1759250075

> This is a vendor selling a solution for "hypothecal" risk not seen in the WILD!

????

What does "This" refer to in your first sentence?

alexchantavy · 2025-09-30T16:29:06 1759249746

Excellent concrete examples with video demos here: https://embracethered.com/blog/

The researcher has gotten actual shells on oai machines before via prompt injection