A capability-safe language would have minimized the Log4j vulnerability

mikewarot · on Dec 27, 2021

So close to getting to the actual root issue (ambient authority), but so far away (blaming the programmer, instead of structural deficits in the OS).

We need operating systems that don't hand out authority like candy if we're ever going to have the usability and freedom that we used to have in the 1980s with floppy based IBM PCs running MS-DOS.

I'm constantly dismayed at the ongoing failure of imagination that accepts a security model designed for a computer shared by co-workers in a small department in a corporation, as the model to chose in the age of mobile code and persistent networking.

It's as if we've designed our infrastructure out of crates of Dynamite and wonder why things keep blowing up.

MattPalmer1086 · on Dec 27, 2021

I agree that ambient authority is the root issue here, but not necessarily at the OS level.

Since logging is just another bit of code intertwined with your other code, it's not clear to me that OS capabilities would help here.

Let's say you have an app that needs JNDI and also needs logging, So JNDI is given a network capability, then log4j calls it. Now we just have a confused deputy within your code.

Ah - but log4j shouldn't be able to get JNDI without being given it! Or the Network object. Or whatever is the thing you need to get a job done. This is the argument by reachability of capability security.

Now you need to ensure that your internal code is structured along those lines, which is really what the OP is getting at.

justinpombrio · on Dec 27, 2021

> So close to getting to the actual root issue (ambient authority), but so far away (blaming the programmer, instead of structural deficits in the OS).

Indeed, this is the entire point of the post! (Well, the second point of the post. The "deeper issue".)

- The root cause is ambient authority: the fact that log4j (and its further dependencies like JNDI) can just create a network connection without having been explicitly given the ability to do so.

- This is not the fault of the programmer. I was focusing blame on the programming language not having capabilities, though as many commenters including you have pointed out, capabilities at the OS level are also very important. Ultimately you want both, and they complement each other.

chii · on Dec 27, 2021

> the fact that log4j (and its further dependencies like JNDI) can just create a network connection without having been explicitly given the ability to do so.

i dont think this is the responsibility of log4j to express it's capabilities like that. The responsibility should be in the runtime - like the JVM, or even the OS.

Services should run in a sandbox, like how a browser runs a webpage in a sandbox, and only allow capabilities explicitly being requested, and require a human to allow it. Or, have a sandbox such that no matter what code executes, it cannot do more harm than a pre-determined amount (such as limited disk space, limited CPU usage, network usage etc).

The root cause is the security model of modern applications.

klodolph · on Dec 27, 2021

Running a whole service in a sandbox doesn't offer the kind of granularity you might want.

dreamcompiler · on Dec 28, 2021

We need to go even further for apps installed in mobile devices: We need to make it impossible for apps to determine that they've been denied a capability. Many useful apps refuse to run at all unless you give them access to personal information (e.g. your location) when there's no need for them to have such information to function. The solution is for the OS to spoof that capability by providing hostile apps with random data, or to otherwise fail to provide accurate data to the apps.

Of course a more ideal solution is for app stores to refuse to allow apps that list capabilities they cannot justify a need for, but at least in the Android world this doesn't seem to be happening.

catern · on Dec 27, 2021

Operating systems are also written by programmers. An operating system written in a capability-safe language would not have such ambient authority problems.

mikewarot · on Dec 27, 2021

Please let me re-state my objection to your assertion, in a different, and hopefully more constructive, manner.

An Operating System is a program that multiplexes hardware resources and makes them safely usable by a number of applications. If the design of that system is flawed, how could a capability-safe language do anything to correct the problem?

It is my assertion that the design of Linux, Windows, MacOS are all flawed. They provide, by default, any program the computer equivalent of "power of attorney". All you have to do is confuse, or subvert any program the user runs one time to abuse that privilege on behalf of an outside influence, and the system is on the way to being pwned.

Memory safe, and even capability safe languages won't do any good, if the underlying OS doesn't enforce the will of the user, but instead freely gives the users authority to any program they happen to run, intentional or otherwise.

I'm not saying memory or capability safety isn't useful, I'm just saying it isn't sufficient.

kragen · on Dec 27, 2021

> An Operating System is a program that multiplexes hardware resources and makes them safely usable by a number of applications. If the design of that system is flawed, how could a capability-safe language do anything to correct the problem?

A capability-safe language safely multiplexes hardware and software resources among different parts of the same application. So, even if every application has the privilege to crash your computer, delete all your files, or exfiltrate your Bitcoin and ssh keys, not all the code in the application would. For example, your logging library doesn't need those authorities, so the rest of the program wouldn't pass them in.

> Memory safe, and even capability safe languages won't do any good, if the underlying OS doesn't enforce the will of the user, but instead freely gives the users authority to any program they happen to run, intentional or otherwise.

It is necessary to enforce the user's desired limits the authority on each application to prevent damage from malicious applications, but capability-safe languages (or hardware, like CHERI) can in most cases prevent damage from confused applications and from confused or malicious libraries.

> I'm not saying memory or capability safety isn't useful, I'm just saying it isn't sufficient.

I agree, but capability safety properly integrated with the user interface may be sufficient, as long as the capability system is sufficiently expressive to express the will of the user. (For example, KeyKOS's capability system includes keys to use limited amounts of CPU time, while E has no effective way to keep malicious code from denying access to a whole vat once it starts running; killing a running process because they're using too much CPU or memory requires some way to recover from their failures, which E does not have.)

catern · on Dec 27, 2021

Yes, and in a capability-safe language it's much more difficult to implement those flawed designs. (In the same way memory safety makes a buffer overrun vulnerability much more difficult to implement)

A concrete example is the Network capability mentioned in the article. The syscall to create a new process does not need network access, so in a capability-safe language that part of the OS won't have that capability passed in, so the OS won't be able to create new processes with network access. The Network capability, if desired for this process, will need to be explicitly added later by other code, in the OS or in userspace.

mikewarot · on Dec 27, 2021

If that were true, you could cross compile Windows 10 to Rust Source code (or whatever language you think its magic enough to be safe), compile it from that language, and then it would never have a security problem again.

Obviously that's not true.

catern · on Dec 27, 2021

Cross-compiling OpenSSL to Rust and then compiling the Rust would also not get rid of the memory-unsafety vulnerabilities created by writing OpenSSL in C.

Nevertheless, if you wrote OpenSSL in Rust, or any memory-safe language, it would not have those memory-unsafety vulnerabilities.

The same applies when writing in a capability-safe language. It's fairly deliberately ignorant to call these kind of properties "magic".

adrianN · on Dec 27, 2021

That's not true because the compiler of the magical safe language would likely reject the Windows 10 source and "cross compiling" is actually a major rewrite.

junon · on Dec 27, 2021

No, it's mostly not true because Rust does not prevent logic bugs.

adrianN · on Dec 27, 2021

Which is why I explicitly didn't write "Rust" is my comment.

monocasa · on Dec 26, 2021

Except the vulnerability was a stack up. The logger wasn't making network calls per se, but passing requests to a component (JNDI) that would obviously need network access. You'd have the same root issue, it'd just manifest more as sort of a confused deputy with a capability based model.

armchairhacker · on Dec 26, 2021

It still passes. Because JNDI would require network access, so log4j would also have to require network access or "disable" the network capability.

monocasa · on Dec 26, 2021

Practically (and like is being suggested here in other comments), JNDI would probably be a separate component (maybe a separate process) with network access, and log4j would just send those user provided strings over an IPC channel, and you'd be in exactly the same place at the end of the day.

ryukafalz · on Dec 26, 2021

Sure, it could be, but in a capability system if the application is starting that JNDI component, granting it network access, and granting the logger access to it:

1) It's pretty dang explicit to the programmer that they're granting the logger network access, because they're the ones writing that logic.

2) Even if you've granted the logger network access, you probably haven't granted it complete access to your filesystem. Network access isn't the only sensitive thing on your system, and a logging library needs very little authority to operate (specific files/directories and its own config?). Getting code execution in the logger shouldn't be game over security-wise, because it shouldn't have that much authority to begin with.

monocasa · on Dec 26, 2021

What it would probably look like was they were granting an IPC channel to the JNDI component, just like they would for DNS or what have you. And then separately "ok, yeah, JNDI needs network access, that makes sense".

So you wouldn't see network access directly in the capability manifest for the logging component.

None of this protects you from "fuck it, load class file from random ldap server" that was apparently coded up in JNDI.

Ericson2314 · on Dec 27, 2021

Let's not conflate network access to get the thing from LDAP and network access for the thing itself. In fact, it's exactly programming with capabilities that would keep this stuff separate, as no one would design the JNDI interface to wantonly pass all the same capabilities to the class.

Well-sandboxed arbitrary byte-code is remote code execution I am OK with!

monocasa · on Dec 27, 2021

> Let's not conflate network access to get the thing from LDAP and network access for the thing itself.

Why not? The attacker's goal was to run untrusted code on your server. They don't necessarily care if it's running in the JNDI process or the server component that's logging as the first step. It's a beachhead into a pretty trusted component, and exploits these days are long chains. I'm sure there are other components over those IPC channels that aren't expecting JNDI to be lying now which can be used to expand that beachhead.

> as no one would design the JNDI interface to wantonly pass all the same capabilities to the class.

Just as 'no one' would load random class files off of untrusted servers into a vm without the fine grained capabilities you're talking about?

> Well-sandboxed arbitrary byte-code is remote code execution I am OK with!

I agree with the spirit, but haven't found a sandbox that stayed "well-sandboxed" over time.

Ericson2314 · on Dec 27, 2021

Excuse us true believers, but the idea is capabilities avoid the https://xkcd.com/2044/ trap by being just dynamic enough.

I would certainly agree not to trust any other sort of sandboxing. E.g. I don't trust Linux namespaces (as the linux devs themselve say you shouldn't) because the syscall interface is far too complex and subtle). But something like CloudABI or Fuschia or seL4 is dramatically narrower in scope.

monocasa · on Dec 27, 2021

I'm talking from experience here with capability systems on microkernels. Capability-based security is a tool, not a panacea. Exploit chains these days are very used to having to jump through IPC channels to components with different privileges to get everything they need.

Edit: As an aside, rather than looking towards namespaces for an attempt at the same structure, seccomp BPF is the primitive I've found that creates the closest thing to the capability system you're talking about. That way you can leave a process with only recvmsg/sendmsg on unix domain sockets, and maybe mmap for memfds shared across processes.

justinpombrio · on Dec 27, 2021

> Exploit chains these days are very used to having to jump through IPC channels to components with different privileges to get everything they need.

Thank you for bringing this up; it's an important point. Do you have a sense of what a practical solution might be?

One thing I can imagine is that there's a JNDI component, but to communicate with it over IPC, you need the JNDIComponent capability. This would allow a couple ways to prevent the log4j vulnerability:

1. You don't give the JNDIComponent to log4j.

2. You use capabilities inside JNDI to separate out the bits that use the network from those that don't, and only supply log4j with a JNDIComponentWithoutNetworkAccess.

This requires co-operation between capabilities in the OS and in the programming language, though, which is a big ask. Plus some foresight; much more than as described in my post.

cesarb · on Dec 27, 2021

> Well-sandboxed arbitrary byte-code is remote code execution I am OK with!

I used to be OK with that too, until Rowhammer showed that sandboxing is more fragile than one would expect. And then came Spectre...

Ericson2314 · on Dec 27, 2021

Yeah it is scary and depressing, but I suppose I just couldn't live in this industry of those are the fatalistic last word.

Trying clean up all the accidental complexity that goes with the ambient authority, using capabilities, will free up "complexity budget" to ponder various timing attacks and whatnot. I think the problems are solvable if only we are willing to move the goal posts from our current obsession over back-compat uber alles that makes everything difficult.

As I understand it, the m1 is similarly better not because any fundamental insight, but because not worrying about various x86 accidental complexity just makes it easier to do things like deeper instruction fetching pipelines etc. (Of course TSMC just killing it doesn't hurt either.)

jerf · on Dec 27, 2021

Ideally, in a perfect world, the capabilities would themselves be something that could travel over an IPC channel, and in fact would have to travel over an IPC channel. The whole point of pervasive capabilities is that there aren't trivial escape hatches that every programmer uses at the drop of a hat. We don't need a new capabilities-based system for that, because we already have that. That's what we all use, every day.

On the one hand, I acknowledge the general impracticality of such an approach, but on the other I'm increasingly less convinced every year that anything less will ever be secure. And I also accept the corresponding implication that if programmers never do come to accept this sort of thing, we will never have secure code.

monocasa · on Dec 27, 2021

It's not that impractical, I've seen systems setup that way with unix domain sockets where you can pass an fd along with a message. I'm trying to remember where (android?), but there's a bluetooth stack setup like this, where your manifest is registered by a trusted entity, that then you talk to over an unix domain socket to get another unix domain socket to the actual bluetooth stack on an IPC endpoint that was setup with the bluetooth stack with only your permission set.

It's just not a panacea. Even if you fracture the permissions, they still exist in many places and exploit chains are used to having to act as the same sort of distributed application as what you're suggesting to exercise those distributed rights.

chrisseaton · on Dec 27, 2021

> log4j would just send those user provided strings over an IPC channel

No it'd have to send the strings and a capability for network access. Which it wouldn't have unless you were specifically doing that, so it would be able to make network access.

monocasa · on Dec 27, 2021

It's JNDI, a network library namespace library sort of like DNS but using LDAP.

Going off what I've seen for DNS daemons in capability systems, the whole point is to not have a network capability on the client of the daemon at all, but only in the DNS daemon, who gets IPC channels setup by by some minter of capabilities in the TCB. That IPC layer lets you make JNDI requests, but because of the capability model is a non forgeable permission over the idea of DNS lookups both of the local cache and over the network if need be. The the DNS daemon is the only one with a network capability for a random UDP port to make requests.

So what you have in your fine grained caps is

    DNS daemon
      * IPC server port
      * UDP port

    client app
      * client port to DNS daemon

If someone coded that DNS daemon as 'you know what, new DNS record type called CAFEBABE that is an arbitrary, untrusted server name and path to load some untrusted .class file from', both the above capability system implementation wouldn't help, and we'd take the DNS daemon authors out and flog them for implementing that. If it was in the RFC we'd take the writers out and flog them too. That's the best defense I've found unfortunately.

And the big issue is that there's tons of these caps even now needed for a system, and it's hard to manage it all even when explicitly in a big table. At best it sort of looks like Terraform, at worst it looks like JCL or an autotools script. It's real easy for it to expand out of someone devops person's mental scope, where we all take potshots on the Internet for not following some hardening guide instead of other guides on a breach.

kragen · on Dec 27, 2021

A capability-secure language would keep log4j from getting access to the IPC channel to the JNDI process, unless log4j's caller specifically passed log4j the JNDI IPC channel capability.

throw149102 · on Dec 27, 2021

A fun excerpt from "Capabilities: Effects for Free"

Capability-safe languages prohibit the ambient authority [18] that is present in noncapability-safe languages. An implementation of a logger in Java, for example, does not need to be initialised with a log file capability, as it can simply import the appropriate file-access library and open the log file for appending by itself. But critically, a malicious implementation could also delete the log, read from another file, or exfiltrate logging information over the network. Other mechanisms such as sandboxing can be used to limit the damage of such malicious components, but recent work has found that Java’s sandbox (for instance) is difficult to use and therefore often misused [2, 12].

Emphasis added my own.

[1] https://www.cs.cmu.edu/~aldrich/papers/effects-icfem2018.pdf

mlinksva · on Dec 27, 2021

https://fossandcrafts.org/episodes/20-hygiene-for-a-computin... does a great job of introducing capabilities in an easy to understand manner.

kragen · on Dec 27, 2021

Thanks, Mike. This kind of introductory material is clearly needed in this thread!

teitoklien · on Dec 27, 2021

What you need is not a capability safe language, but rather a tool to make application jails and similar systems more accessible.

Your os exists for a reason, let it handle the sandboxing for you, and not the language.

Otherwise you fall into troubles later with different sandboxing vulnerabilities in different compilers.

Have a codebase that compiles using deprecated features which are removed on language version 2.1 ?

“Tough luck, security patches only exists in 3.1 and above, guess you’ll have to stay unsafe”

Not to mention the pain it would bring on languages to backport safety features to older LTS releases of the compiler.

Just why?, let the os handle these things for you. It is called an operating system for a reason.

Log4j vulnerability was minimised in any organisation that restricts its applications network capability to whitelisted machines only, with the help of its OS features.

(Yes your application might need network access, so run it through a reverse proxy and filter the traffic based on strict set of rules)

dccoolgai · on Dec 27, 2021

Capabilities model is not about "features" which might be deprecated, but more about "access to standard resources". Capabilities e.g. doesn't care that you are running version 3.1.73 of a certain dependency, it cares that you want to open a network connection to 123.76.34 or that you want /usr/bin/secret file.txt.

samus · on Dec 27, 2021

The version [of a dependency] starts to matter if it blocks upgrade to a version of the platform that would allow to control a specific new capability.

justinpombrio · on Dec 27, 2021

This is a good point. I talked about capabilities at the language level, but they're just as important if not more-so at the operating system level.

If you have capabilities in the programming language, then you can make guarantees about what resources (like the network) your program's transitive dependencies can use (in process).

If you have capabilities in the OS, then you can make guarantees about what resources a process can use. EDIT: this is how you would use capabilities in the OS to obtain "application jails". But capabilities would be more flexible here, for example by allowing processes to share them.

As monocasa points out in a separate thread, log4j is actually a messy example---messier than I realized when I wrote this post---because the way it (is? might be?) set up is that your application runs in one process, and JNDI in another. Which brings up a lot of design questions around capabilities in the OS and in the language and in how they interact that I don't know much about.

But the primary point of capabilities is to limit how access control can flow from place to place, using standard data-flow mechanisms. In a programming language, this means tying access to whether you have a pointer to an object. In an OS, I think this looks more like OS-level permissions granted to a process, that processes can share with each other.

You don't have some big table somewhere saying who is allowed to do what. Instead, access is granted by a thing, that you can pass around. When you get used to the idea, it just feels very natural and powerful.

naasking · on Dec 27, 2021

Sandboxing cannot provide the same security benefits of capabilities, even in principle.

to11mtm · on Dec 26, 2021

So, IDK what the Java world had that was similar, but in .NET at one time they tried to solve this at the runtime VM level with the concept of Security Permissions and trust levels. In practice, this wound up being:

- Somewhere between too confusing/frustrating to developers (especially ASP.NET ones, where often the escape hatch of 'AllowPartiallyTrustedCallersAttribute' would get thrown around)

- Hard to manage from a per-app granularity standpoint.

Edit, hit the post button too early:

In any case, this behavior wound up getting changed to be a lot more forgiving in .NET 4 (although, often in fact requiring the removal of the aforementioned APTCA from your web projects,) and IIRC it's gone in Core.

gnufx · on Dec 26, 2021

http://joeduffyblog.com/2015/11/03/blogging-about-midori/ sounds relevant as what sounds like a contrast in that space.

pjmlp · on Dec 26, 2021

Java had JAAS, which fulfilled a similar purpose, and it is scheduled for removal for similar reasons CAS was taken out of .NET.

xxpor · on Dec 26, 2021

There's what I think is a similar concept in Java called security contexts. I've never encountered it actually being used, and it just tends to get in the way, like you said. These strike me as similar to SELinux in that regard.

Ericson2314 · on Dec 26, 2021

Capabilities are underrated as a generally way to purge bad archictures, make it clearer what code is doing, and generally cut accidental complexity & improve programmer productivity.

This is a big deal, because many security practices are neutral or bad for programmer productivity.

We need a big project to get CloudABI implemented in all the major kernels to make the theory reality. Whereas before it was unclear what was a good candidate to get this stuff in prod, now it is very clear that socket-activated services are an idea use-case, with very little migration pain.

Even if you think we should be going to Fuschia or SEl4 or whatever, I think this is a good stepping stone. Those projects are a big jump alone, and funding is uncertain. (Plus there are issues of single-company dominance with Fuchsia.) I think CloudABI is the sort of "non-reformist reform" not "worse is better" stepping stone that would help those projects not hurt them.

ryukafalz · on Dec 26, 2021

Agreed re: the general idea, but isn't CloudABI in particular superseded somewhat by WASI? Its repo seems to say it is: https://github.com/NuxiNL/cloudabi

(WASI is similarly capability-based, as I understand it!)

Ericson2314 · on Dec 26, 2021

I don't think so. WASM is changing things on many fronts. CloudABI is just doing one front.

I don't have any thing against WASI, and I don't blame them for wanting to point out a like-minded project that was still active. But just as I think CloudABI is a good stepping stone for seL4 or Fuschia, I think it is a good stepping stone for WASM.

Also, I guess I don't believe in coupling change on in principle independent axes. If you at least allow the knobs to be turned separately, even if you don't e.g. CI or otherwise support all combinations, you are incentivized to handle things more "parametrically" vs if-def soup (which matches capabilities, incidentally!) and you have a great way to troubleshoot stuff. This is like how NetBSD says they like supporting obscure architectures to catch more bugs in the portable code too, not just make their lives harder.

catern · on Dec 27, 2021

WASI could be used without wasm, in theory. So it doesn't have to couple changes on multiple axes together.

I agree with the other poster, WASI is the next step for those who like CloudABI and Capsicum, and may really win by being coupled to the browser.

Ericson2314 · on Dec 27, 2021

By WASI alone you mean just do something a lot like cloudabi with the home directory emulation baked in?

The idea is the interface, and that is very nice and simple, so sure. But I think the ability to catch on must be in the implementation. I suppose parts of the WASI libc could be reused, but those parts could equally well be taken from the original Musl, right?

pjmlp · on Dec 26, 2021

> As of October 2020, CloudABI has been deprecated in favor of WebAssembly System Interface for lack of interest

On its Wikipedia entry, so most likely it won't go anywhere.

Ericson2314 · on Dec 27, 2021

I also rewoke this LKML thread https://lore.kernel.org/kernel-hardening/01e72780-e328-23b5-... a few months back, because my one quibble with CloudABI is its all-singing-all-dancing fork+exec abi.

Making an embryonic process, mutating it's state as desired, and then submitting to the scheduler is a much nicer workflow, and more in the spirit of capabilities anyways where "fork = duplicate the whole keyring and then destroy some caps" as foolhardy.

FreeBSD already had process/PID FDs, but I think CloudABI avoided them because it wanted to be easier to port. But now that Linux has them too, I don't think this should be a such a portability concern.

catern · on Dec 27, 2021

FWIW that's what like what I do in rsyscall https://github.com/catern/rsyscall http://catern.com/rsys21.pdf

Ericson2314 · on Dec 27, 2021

Ah glad you wrote up the idea. If I get around to trying to have that same conversation with other kernels, would be good thing to point to!

Ericson2314 · on Dec 27, 2021

I know, but support is still in FreeBSD. My big long term plan is:

1. Work on FreeBSD cross in Nixpkgs, because I need a way to pin forks and run nice VM tests without going insane. (We already have NetBSD cross.)

2. Rig up a booting image that uses https://github.com/InitWare/InitWare, the fork of systemd.

3. Add support to CloudABI in initware.

4. Bang on drum for other OSes and upstream systemd to implement this stuff we can can good portable abstractions -- I think this is our best shot to get "portable containers".

loeg · on Dec 27, 2021

I believe FreeBSD has removed support in CURRENT.

Ericson2314 · on Dec 27, 2021

Oh no that's a bummer. Well I think it's a smallish patch so easy to redo, but still.

Ericson2314 · on Dec 27, 2021

Just https://reviews.freebsd.org/D31923 ?

loeg · on Dec 27, 2021

Ericson2314 · on Dec 28, 2021

Whew!

adfhadfignio · on Dec 26, 2021

Java has actually had this built in for a long time now. A SecurityManager allows you to restrict access to things like the filesystem and the network (and whatever else you want). I have never seen it used in a real codebase.

https://docs.oracle.com/javase/tutorial/essential/environmen...

nibix · on Dec 27, 2021

Elasticsearch is actually using SecurityManager with quite thoroughly locked down policies; and it seems that this actually saved ES from being vulnerable to the RCE.

The irony is now that OpenJDK just recently decided to deprecate the SecurityManager in Java 17 and remove it in Java 18.

See also this Twitter thread: https://twitter.com/rcmuir/status/1469730949810339843

cesarb · on Dec 27, 2021

We actually use it in a real codebase. When the software is running on "developer mode" (on a developer's machine), we install a SecurityManager which denies all outbound connections except for localhost by default.

samus · on Dec 27, 2021

The security manager represents a flawed way to do it. It tries to catch up and restrict an application after it already has access to the releveant APIs. Forget to restrict just one API, and the sandbox can be escaped. Usually, the integration with the security manager requires intimate knowledge of the application.

Javascript follows a better approach because it has a small core API. All other APIs have to be explicitly injected and permitted by the host program.

catern · on Dec 27, 2021

Java's SecurityManager is very different from capability-safety. It's much more like setuid: SecurityManager allows or disallows network accesses depending on what code is running. (Literally by inspecting the stack)

alasdair_ · on Dec 27, 2021

No, it wouldn’t have, for the simple reason that no one is willing to go to the extra effort to use a capability-safe language, for the same reason we don’t all code in formally-verified languages - it’s just too much work.

The claim in the article could be “The log4j vulnerability could have been prevented by hiring someone to punch any developer who tries to use JNDI in a logging library in the face” and would be equally valid.

clusterfish · on Dec 27, 2021

I don't think a well designed capability system would be so onerous as to be unworkable, people use other complex features in Rust and C++ and Haskell just fine. It's just that being capability-safe by itself is not enough to carry a niche language. Eventually these ideas will find their way into more mainstream languages, one way or another.

kragen · on Dec 27, 2021

I disagree with your premise that a well-designed capability system is a complex feature. It's better described as an absence of certain complex features.

kragen · on Dec 27, 2021

It sounds like you haven't tried it, so you're just guessing that it would be hard without any knowledge, because you're used to writing code in such a way that security is inconvenient.

Writing code in E is nothing like writing formally verified code. It's a lot less work than writing code in Java or C++.

alasdair_ · on Dec 28, 2021

> Writing code in E is nothing like writing formally verified code. It's a lot less work than writing code in Java or C++.

I bet it’s not, for the very simple reason that most software today isn’t written in E.

This sounds like I’m being flippant but I’m not. The total “cost” of writing code in a specific language includes things like “can I find the answer on Stack Overflow?” and “can I hire enough engineers to code in this language?” And the total cost of coding in E is almost certainly higher than in Java because if it wasn’t, people would already be doing it.

kragen · on Dec 28, 2021

I explained why your comment is not to the point 15 hours before you wrote it at https://news.ycombinator.com/item?id=29700919, because someone else already wrote the same thing.

PedroBatista · on Dec 27, 2021

Is it?

I’m referring to real world coding to solve real world problems in order to get real world paychecks.

kragen · on Dec 27, 2021

You're going to have a hard time getting a job writing E, or for that matter finding E libraries, because nobody uses E. Same story with Lobster, Clean, RScheme, Cobra, or a zillion other languages that don't have much adoption. But that is an entirely separate question from it being a cumbersome programming language, which is what the grandparent comment was alleging of all ocap languages—based on, apparently, total ignorance.

There's nothing wrong with total ignorance; it's where we all start. But there is a great deal wrong with pretending that your total ignorance is expertise that justifies dismissing what you're ignorant of. Believe me, I have a lot of experience being that guy. It sucks.

pdimitar · on Dec 27, 2021

I don't disagree but we have to start somewhere. For example: formally specify [the clear parts of] HTTP 1.1 with declarative programming (which will generate proper code) and that would already be a huge jump.

My point is: at one point somebody has to contribute.

kragen · on Dec 27, 2021

You may have intended to comment on a different article, since not only does your comment not relate to anything in my comment it was related to, it doesn't relate to the security approach being discussed in the article the entire thread is about.

kortilla · on Dec 27, 2021

Here’s the alternate history where “Network” is passed as an argument to the constructor.

The year 2006 rolls around and centralized logging is becoming pretty cool. Someone requests that log4j add support to logging to a UDP syslog server. The constructor now requires “Network” and now everyone still thinks it’s reasonable.

IMO the elephant in the room is code execution of remotely loaded code. I want to be able to give something access to the network but not be able to execute stuff from network provided sources.

bo0tzz · on Dec 27, 2021

It's a similar story though, that could be covered by another capability - one to access the Java classloader, or run executables on the filesystem, for example.

gnufx · on Dec 27, 2021

The way I put it was how come a remote process has authority to load arbitrary code into the local process?

tomjakubowski · on Dec 27, 2021

The last few paragraphs of the post address your post's alternate history.

gnufx · on Dec 26, 2021

Hear, hear. I've said at work that many, perhaps most, of the security issues I've seen are at least related to ambient authority. I never see these major issues trigger any deeper thinking about the issues beyond the relatively shallow bug causing the vulnerability.

(A subtlety here is that you may want authority to write to a network filesystem.)

vsareto · on Dec 26, 2021

Why should a programming language be limiting network access? Why wouldn't we do this via the operating system?

Smaug123 · on Dec 26, 2021

If your language is capable of expressing "this part of the code can't access the network", in a general way such that networking is not some special snowflake that's baked in, then you've created an effects system which is likely to be useful in many, many other contexts for other things you want to assert without having them supported in the OS. (For example, "is this code async", or "does this code print anything", or "does this code have any side effects", or "does this code throw", or "does this code allocate".)

MattPalmer1086 · on Dec 26, 2021

That's an interesting idea. Although denying network to log4j wouldn't have stopped this I think, as it was the JNDI code that made the network calls, not log4j directly.

creatonez · on Dec 27, 2021

Depends how capabilities work in the system. A restriction could prevent something from occurring at all in a bit of code, even in outside calls.

nicoburns · on Dec 27, 2021

You might be able to be more specific and only allow network connections to whitelisted addresses or similar.

MattPalmer1086 · on Dec 26, 2021

That would be difficult to do within an application. If your app needs network access, but logging performed by the app should not, how could the OS help there?

avar · on Dec 26, 2021

The usual ways to do that are to open a connection to the things you need at startup, and then drop your privileges, set ulimits or whatever. Or you don't have network access at all, but pipe your logs to a local process that does.

Alternatively the application can be made to have some OS-specific rules to e.g. generate iptables rules on Linux, so it can contacts its logger, but not anything else.

All of those are probably anathema to the typical Java programmer though, as they require crossing the boundary from JVM into OS configuration.

MattPalmer1086 · on Dec 27, 2021

I don't think Java apps can drop privileges like that, but yes, that would prevent unexpected bits of code from acquiring a network connection. I suspect only the most secure code bases would bother though, and most things would still be affected even if it were possible in Java.

chrisseaton · on Dec 27, 2021

Because the programming language knows things at a finger-grained level than the operating system does, which can allow for better protection.

jayd16 · on Dec 26, 2021

It might be useful that some code in a single process have access while other parts do not. How would you propose an OS handle those cases?

alerighi · on Dec 26, 2021

And how do you do that, without operating system support? I mean, you can't of course limit that based on the code, in the JVM there is only a single address space, thus every method or class can be instantiated from anywhere in the code (despite public/private/etc. that are only for programmer convenience, they are easily circumvented with reflection, they don't provide any security at all).

You can imagine doing something like adding some bytecode instructions that if encountered by the JVM block the network access to every other request done in the call stack after that instruction. Something that reduces efficiency drastically and still can easily be circumvented if you don't implement different address spaces for different parts of the code (something that only the OS can do).

naasking · on Dec 27, 2021

> I mean, you can't of course limit that based on the code, in the JVM there is only a single address space

You can in principle, just look at the Joe-E language.

The article is likely talking about revisions to existing VM designs though, because capabilities are a fundamental, architectural decision that underlie the runtime.

jayd16 · on Dec 27, 2021

If you're asking how th jvm might do it, I can assure you I do not know. If the question is "can this be a language feature" then I say, surely it can be.

A language that supports static analysis and can disallow dynamic dispatch would do the trick, no?

alerighi · on Dec 29, 2021

There are ways to call a function that cannot be caught by static analysis: basically you just need a way to jump to an arbitrary address of memory where a particular function is stored. If we exclude languages that by design sandbox the code (for example JavaScript, and most of the time it isn't enough because engines are bugged and thus browsers also use facilities of the operating system to sandbox the entire engine itself) no other programming language can do that.

vsareto · on Dec 26, 2021

I don't think that's particularly useful though. A program with requirements like that seems more likely to be split up into two independent pieces.

Jweb_Guru · on Dec 26, 2021

But that's not at all the case in practice. Logging is intertwined with other program functionality.

samus · on Dec 27, 2021

Most logging libraries are facades over a concrete implementation. And there exist adapters for those that aren't. One could easily provide one that pipes the log stream to a child process that drops privileges and is restricted down by the OS. This will probably entail a performance penalty, but it could be worth it. It would be nice to have such a capability in-process though.

yjftsjthsd-h · on Dec 26, 2021

Programming language can know more about the program's (intended) state at any given time. The OS can/should frequently be involved, but even then you're frequently doing something like pledge() from the program to let the OS know when/how to restrict you. Doing it totally externally, like SELinux does, is valuable but coarser-grained.

Spivak · on Dec 27, 2021

Pledge and SELinux are the same level of graininess but pledge’s sauce is that the app has a channel to ask the OS for process restrictions where SELinux has to be applied in advance.

yjftsjthsd-h · on Dec 27, 2021

I think it's reasonable to describe one as more coarse gained when it simply doesn't have enough information to be as precise as the other method. Yes, they work at the same layer and can do vaguely the same things, but s targeting makes a huge difference.

vsareto · on Dec 26, 2021

Can you not use network namespaces on Linux to give a process its own restricted network device and then filter that traffic?

Spivak · on Dec 27, 2021

Yes, and you can even have a single process with threads in multiple namespaces. But it still doesn’t solve the problem unless you magik passing all log messages to a dedicated pool of threads just for logging.

The process is simply the wrong layer for this boundary, it’s too coarse and trying to hack your way into telling the OS about the parts of your program. OS security treats processes are black boxes, just like hypervisor security treats vms like black boxes. Trying to force it will be extremely clunky.

samus · on Dec 27, 2021

The Linux kernel developers don't recommend using namespaces for security, but only for access control (the two are not the same!) because it's a relatively recent concept and the syscall interfaces and semantics are huge and subtle. Dedicated user accounts together with firewall rules are a better idea IMHO.

DarkmSparks · on Dec 26, 2021

Personally I wonder whether anyone actually used log4j willingly or just because its generally used by all the apache projects and they wont start without it.

With a healthy dose of "use the latest versions" mentality.

I checked back and all my java code was still running log4j 1.1, and removing it has been on the todo list ever since it needed linking.

You should be checking all your 3rd party source code, "capabilities" wont get round that, the number of applications that dont need network access these days is as good as zero.

fiddlerwoaroof · on Dec 26, 2021

I think you’re misunderstanding capabilities: they can apply also at the class/method/module level. If you’re language doesn’t have “ambient authority”, nothing can create a network connection unless it’s explicitly granted permission to do so and, typically, this permission is granted by passing some sort of non-forgeable token around.

DarkmSparks · on Dec 27, 2021

Java used to have exactly this with security manager https://docs.oracle.com/javase/tutorial/essential/environmen...

Its been removed from the language because "it wasnt needed or used".

fiddlerwoaroof · on Dec 27, 2021

This is more like ACLs, which are a completely different paradigm from capability-based security. An ACL system is based around specifying permissions for actions and such, whereas capability systems are based around reifying authority in an unforgeable way and passing the resulting tokens around.

DarkmSparks · on Dec 27, 2021

I don't see the difference between passing tokens around and inheriting a security manager that can only have permissions and capabilities revoked?

samus · on Dec 27, 2021

The former is explicit, and an IDE can help with highlighting unused ones. The latter is difficult to audit because the code that uses relevant APIs is disconnected from the privilege-enforcing mechanism. It's either some sort of abstract policy framework or explicit privilege dropping

DarkmSparks · on Dec 27, 2021

I don't remotely see how an ide could help highlight which functions are insecure to call reflectively, other than linting, and that applies to both and only helps so much.

kragen · on Dec 27, 2021

You should probably spend ten minutes reading an introduction to capabilities then so you can understand the basic concepts before commenting. I don't know what to recommend nowadays, but mlinksva posted a link to one.

DarkmSparks · on Dec 27, 2021

I wasnt the one saying security manager was the same as an acl.

Nothing you posted explains why a reference to the security manager class isn't a token.

It seems to me more that you dont understand either capabilities or the old security manager class.

kragen · on Dec 28, 2021

> It seems to me more that you dont understand either capabilities or

Mark Miller cited me ("K. Sitaker") in Capability Myths Demolished and Paradigm Regained:

https://www-users.cselabs.umn.edu/classes/Fall-2019/csci5271...

https://www.hpl.hp.com/techreports/2003/HPL-2003-222.pdf?jum...

Tyler Close cited me in ACLs Don't:

https://www.hpl.hp.com/techreports/2009/HPL-2009-20.pdf?q=do....

It sounds like you disagree with their judgment on this point.

DarkmSparks · on Dec 28, 2021

acls are only a tiny part of security manager did. It was a full fledged class based security system.

Yes, I do disagree that something like pfsense "is only an acl"

Or that capabilities can function without lists.

You dont?

kragen · on Dec 28, 2021

I'm not going to argue with you. You haven't earned it. Go and study.

DarkmSparks · on Dec 28, 2021

You didnt argue. You just posted a load of completely irrelevant links that have absolutely nothing to do with the java security manager class or why it was depreciated in Java 17.

I believe there is a name for that - strawman.

the jep is here https://openjdk.java.net/jeps/411

Not checked but Im fairly certain none of your links are referenced there.

lmilcin · on Dec 26, 2021

> Personally I wonder whether anyone actually used log4j willingly

Any project written in Java that has ANY dependencies will also use Log4j. Not even your choice anymore unless you are willing to write your entire stack from scratch.

The moment you start thinking about how to solve logging you will land at a log4j-like framework anyway.

> You should be checking all your 3rd party source code

How is that a usable advice? There are dozens of dependencies in a simplest application. A non-trivial may easily have above hundred dependencies.

fiddlerwoaroof · on Dec 26, 2021

I’ve always used the various log4j adapters and then excluded log4j from the transitive dependencies.

carey · on Dec 27, 2021

Same. Gradle even lets you configure replacement dependencies, so if anything depends on Log4J it automatically gets the adapter instead.

DarkmSparks · on Dec 27, 2021

> How is that a usable advice?

Its not really advice, its an essential requirement if not getting pwnd is important to you. Expect to get pwnd eventually otherwise.

thrower123 · on Dec 27, 2021

Same, all of our Java projects were using 1.X log4j, and have been for years and years without issue. Ironically, all of this scrutiny on the problems of the 2.X releases has forced us to agree to update from those rather dumb, safer builds to the latest releases, so that we'll have to be on the treadmill.

samus · on Dec 27, 2021

Log4j 1.x contains vulnerabilities and errors that were never addressed because it was declared end-of-life in 2015 and the Apache project has stopped supporting it!

https://logging.apache.org/log4j/1.2/index.html

https://www.cvedetails.com/cve/CVE-2019-17571/

cube2222 · on Dec 26, 2021

> the number of applications that dont need network access these days is as good as zero

You can still have all outgoing traffic be sent through a proxy that denies any target that is not explicitly on the allowlist.

lgrapenthin · on Dec 27, 2021

While tools can save us from stupid mistakes, some mistakes are so stupid that relying on tools to prevent them seems much more dangerous even. Log4j should not have been used by anyone, and certainly it should not have been extended and "improved" in an endless cascade of irresponsible additions, likely violating every principle of good practice, ever. I hope that this is what industry people take away from it, not that some magic language feature should have prevented it.

catern · on Dec 27, 2021

If log4j is such a fundamentally bad idea, surely we should expect our tools to stop such ideas from ever being implemented.

In the same way that modern languages are memory-safe (disallowing pointer arithmetic, because it's proven to be a terrible idea), a real modern language would disallow log4j by being capability-safe.

dkysang · on Dec 26, 2021

Does this interact with or preclude operating system level capabilities?

Ericson2314 · on Dec 26, 2021

It interacts with well.

OS-level ones call the shots, but maintaining the same discipline within a process is a good way to write better code, (especially better libraries).

josephcsible · on Dec 26, 2021

This approach reminds me of Haskell's effect systems.

eska · on Dec 26, 2021

It seems more practical to use BSD’s approach of pledging once in main that the process won’t access the network. Parts of the program that need different capabilities are isolated in their own processes and communicated with using IPC. I don’t think people want to pass all kinds of capabilities around in every function call.

MattPalmer1086 · on Dec 26, 2021

Sounds better in general, but probably wouldn't help with something like logging which would probably be used in all the processes. Unless you want to make IPC calls for every logging call.

vladvasiliu · on Dec 26, 2021

> Unless you want to make IPC calls for every logging call

Isn't this more or less what ends up happening anyway? Sure, from the application's perspective it's just a function call. But usually, in the end, the logs are shipped to some central location one way or another.

MattPalmer1086 · on Dec 26, 2021

Hmmm, good point!

vlovich123 · on Dec 26, 2021

I don’t know. It’s not necessarily a bad idea because then you have a single audit point for all logs and can see the cost centralized in measurement tools vs it looking to be in the noise and never popping up unless you have particularly egregious hot spots.

reactivenz · on Dec 27, 2021

As was said with the .Net had something like this.

This reminds me of Threads vs Tasks in .Net also, where to do async threading well, you used to have use API's that wanted the kitchen sink of options.

To one group of user this was a chance to go read lots of stuff and workout what all these options meant and why they where want, security etc etc. Writting solid code, etc etc.

To the other group of users, you get a magic screen full of code you cut'n'paste and it "magically works", this bit done, next problem.

Then Tasks came alone, and the code Demo's well because it just works and is sleek and only needs "a couple of lines of code". That is until you want to use a pool, and manage this and that, and slowly you learn/add all layers that provide the old API's functionality, as you discover you actually needed more code.

Which relates, the log4j code "just worked" so everyone closed that issue, and got another ticket started. There is mostly not idea of craftmanship, it mostly "velocity" based thinking. Thus in the last decade the simpler snippets of code that just work, allow more cheaper/mindless monkeys. And we just grab packages, tools , and leverage "free and awesome" but don't understand anything of what is actually happening.

And it's not going to improve, because there is a drive for more code, code is already too expensive, so it's just going to be done cheaper and cheaper. So whatever neat "solutions" to the "nobody will ever read the manual to do it correctly" problems of this class represent, they better work, for all the ever increasing people, that will never read the manual.

continuational · on Dec 27, 2021

I am working on a capability-safe language!

I fully agree with this article, and the great thing about object capabilities is that they are just plain values, and require nothing special from the type system.

This also means that object capabilities aren't "colored" like async/await, monads, or other effect systems.

Zababa · on Dec 27, 2021

Do you have more to share, like resources about capability-safe languages, or your ongoing work?

> This also means that object capabilities aren't "colored" like async/await, monads, or other effect systems.

That's interesting. How does that work? Is it by passing around a value from your main function to the functions that need it?

continuational · on Dec 27, 2021

> Do you have more to share, like resources about capability-safe languages, or your ongoing work?

I can share an example from the main function of the self hosting Firefly compiler [1]:

    main(system: System): Unit {
        ...
        let fs = system.files()
        if(fs.exists(tempPath)) { deleteDirectory(fs, tempPath) }
        ...
    }

    deleteDirectory(fs: FileSystem, outputFile: String): Unit {
        fs.list(outputFile).each { file =>
            if(fs.isDirectory(file)) {
                deleteDirectory(fs, file)
            } else {
                fs.delete(file)
            }
        }
        fs.delete(outputFile)
    }

Main is passed `System`, which is a value with methods to access the network, the file system, etc. It passes on the `FileSystem` value to `deleteDirectory`, which only has methods to access the file system.

Since there's no other way for `deleteDirectory` to obtain capabilities than to recieve them as arguments, `deleteDirectory` only has access to the file system.

> That's interesting. How does that work? Is it by passing around a value from your main function to the functions that need it?

Exactly; such arguments can themselves be seen as "coloring", since they show up in the function signature. However, an important distinction is that you can capture capabilities in the fields of an object or in closures, and thus get rid of the coloring.

The Firefly language is quite far along - nearly feature complete, and translating itself. However, there's a lot of work left before it becomes a viable alternative to existing languages, not least in the tooling and documentation department.

[1] https://github.com/Ahnfelt/firefly-boot/tree/master/compiler

Zababa · on Dec 27, 2021

Thank you for the example, that helps a lot. So for example, when you write a library that need capabilities, you would write it using dependency (capability?) injection, and then the main program would have to pass an object with that capability for the code to work? And I see what you mean by "they are just plain values, and require nothing special from the type system", as long as you restrict what part of the program can produce a System or a FileSystem, you can just use plain typechecking.

continuational · on Dec 27, 2021

Yes, dependency injection (without DI frameworks) is pretty much what it is! Just taken a bit further to the logical conclusion.

I would point to an online resource, but honestly I haven't run across any that does justice to the simplicity of the concept.

Zababa · on Dec 27, 2021

I'd say our conversation right here is a good example. Most people understand static typing, most people understand DI. Once you have those, the only part left to understand is the concept of capabilities. If you have all of those, your example is clear and simple. Maybe adding a type annotation to show that fs is a FileSystem in main, but that's about it.

I think you've done a good job of minimizing what's required for the average developer to go from not knowing about capabilities to using them.

continuational · on Dec 28, 2021

Great! Maybe I can ask for your feedback once we've produced some documentation?

Zababa · on Dec 28, 2021

Yes, I would be very interested. My email address is in my profile.

thrower123 · on Dec 27, 2021

In theory, this always seems like it would be a good idea.

In practice, I think it would turn into a ball of mud because your capabilities would be viral and end up infecting all the way up the stack, similar to the way async/await works in dotnet or the IO monad does in Haskell.

samus · on Dec 27, 2021

Several good points, and so much to comment on. I will restrict myself to repeat that Java already has several mechanisms to enforce capabilities, and that this is mainly an exercise in API design.

However, the article fails to recognize that Log4j (and probably other libraries) have different capability requirements at different points in time and in code. At startup and when reloading configuration, only access to some severely restricted JNDI lookups should be allowed. Core message formatters and lookups should have minimal privileges. Appenders require access to network and filesystem APIs though!

The capability style presented by TA requires the ability to use of Dependency Injection to properly inject handles that permit access to restricted APIs. This is quite fine in the case of Log4j, which allows programmatic configuration, but it would force developers to give more cababilities than strictly required to more monolithic components!

the_arun · on Dec 26, 2021

Ideally all egress network connections should go via NAT Gateway & filters at NAT Gateway should have policies to say which request can go out. Failures do happen in programming, we need to have better guardrails ensuring security.

jiggawatts · on Dec 27, 2021

In the field this causes more issues that in solves.

For example, overzealous secops admins block "port 80 outbound", because everyone knows that HTTP is "insecure" and HTTPS is "secure". Except that the SSL certificate on that HTTPS website needs OCSP on port 80 to be fully secured. If you block it, you get random 30 second timeouts and less security.

In general, totally blocking the Internet and then punching holes is like playing whack-a-mole. At first, you'll block Windows Update because you should be "managing that" through something like WSUS or SCCM, right, right? No, the SCCM guys never did figure out how to manage the DMZ servers, so the most Internet exposed servers are not getting patched now. Congratulations!

So you'll open up Windows Update. Except that you didn't, because you forgot that it also includes half a dozen vaguely related, infrequently used and undocumented additional URLs. Which are all CDNs. So they're a CNAME to a CNAME to a pool of A records approximately half the size of all of Azure.

And there's the Linux machines. The BSD-based network appliances and their various call-home support features. And on, and on, and on.

How exactly are you going to configure this in your "NAT"? You discover that you can't. You're going to have to use a HTTP web proxy.

Congratulations, you've now blocked IPv6, HTTP/3, nerfed web performance for everyone everywhere, and introduced a terrifying single-point-of-failure that will even break basic authentication (Azure AD, Okta, SAML, etc...).

You can lock out an entire network completely by accidentally powering off the web proxies, to the point that noone can log back on to turn them on.

This is just the tip of the iceberg. I could rant for hours about how NAT and forced outbound proxies break more things than they solve. Famously, TLS v1.3 uses v1.2 in the protocol header because the morons that write web proxies don't understand how protocol versions work.

Oh, okay, one more: Windows 11 and 2022 introduced new TCP optimisations (HyStart++) that improve performance on high bandwidth delay product links like modern cloud and 5G networks. Except that web proxies don't generally keep up with optimisations like this, so you gain nothing even if you upgrade the desktops and server to modern operating systems. (Same thing applies to Linux too)

tialaramex · on Dec 26, 2021

NAT is completely orthogonal to this problem. You say people could have network filters, they could, with or without NAT.

xxpor · on Dec 26, 2021

NAT != firewalling. People conflating the two leads to policies like needing NAT for ipv6.

jrockway · on Dec 26, 2021

What sort of software are you using for the NAT gateways? (I know that's an AWS term, but not what I'm looking for. I've used that to provide a consistent outbound IP for production traffic so that upstream providers can allowlist an IP to talk to their API. Dumb security model when TLS client certificates exist, but... easy to set up I guess.)

I've always wanted to defend against the attack where an application routinely talks to some API hosted on AWS, but an attacker starts using the application to exfiltrate data to another API hosted on AWS. If you just look at outgoing IP addresses, you'll think something like "oh, that API just started another replica" and not "my app is doing something weird". I want to do MITM on my applications so that every outgoing payload can be inspected.

(I come at this from a security angle, but I'm really more interested in debugging in production. "Page foo that talks to the bar API has stopped working" "Oh, here's the JSON it started returning instead of text/plain." Bug fixed in 5 minutes.)

I know this sort of pattern is common for, say, corporate firewalls, but I haven't seen any good projects for doing it to arbitrary applications running in production. I looked at Apache Traffic Server which might be the right thing, but seems super old and doesn't support any integrations I'd want (OpenTracing, Prometheus). I also tried configuring Envoy to do what I want, but it also didn't work. (Things like Istio's Egress Gateways seem to not intercept TLS, so you just get a list of IP addresses requested, not URLs.)

I was thinking of just writing something to do this, but I know that everyone on Earth wants the same thing, so I figure I'm just missing the obvious out of the box solution. I'm 100% OK with fail closed (applications must be configured to only egress through a known-trusted IP of the proxy, all other network connections fail), reconfiguring my applications (happy to use some library for this, or inject TLS certificates for authenting the MITM proxy to the application), and this only working with HTTP. Suggestions?

samus · on Dec 27, 2021

There is Blue Coat. The technology to MITM TLS connections is widespread, but I'd recommend against it. You would create a single point of failure for your whole infrastructure and a prime attack target. Also, if the proxy fails to correctly validate the TLS connection, every service will end up talking to the wrong endpoints.

Possible alternative: a dedicated API facade that forwards to the real APIs, and blocking of all other connections. There might be a tool that generates one from Swagger specifications. Also, require a client certificate or a password for each service. This way, you get centralized logging and access control without breaking TLS.

randombits0 · on Dec 26, 2021

What you let out is just as important as what you let in.

nickdothutton · on Dec 26, 2021

For at least 25 years we have been telling people to implement egress filtering and to only allow outbound the specific, known, necessary connections from any given server. Rarely do I see it done.

axiosgunnar · on Dec 27, 2021

Hindsight is 20/20