I agree, "assume unknown, unaudited packages are malicious" is the ideal stance....

7373737373 · on Nov 2, 2022

In Python, dynamic imports exist, making this impossible

actually_a_dog · on Nov 2, 2022

I don't see how having dynamic imports matters if all you want to do is detect if a specific file is imported. Run the install and see what gets imported. That's it.

7373737373 · on Nov 2, 2022

If you actually have to execute a program (but have no safe way of doing so), to see if a complex routine that may return any filename imports a safe file or not, then you are facing up against https://en.wikipedia.org/wiki/Rice%27s_theorem

actually_a_dog · on Nov 3, 2022

So? Any method of detecting a "malicious package" faces Rice's theorem, unless you want to claim that "malicious" is a trivial property.

7373737373 · on Nov 3, 2022

Which is why any approaches relying on identity verification or scanning are bound to fail - sandboxing/capability security MUST become built into languages

ashishbijlani · on Nov 4, 2022

Are you suggesting that you would rather wait for languages to provide robust sandboxing capabilities and not use available static/dynamic analysis tools (e.g., Packj [1]) to audit packages for malicious/risky indicators, particularly when we hear about new attacks on open-source package managers almost every week?

1. https://github.com/ossillate-inc/packj [Disclaimer: I built it]

7373737373 · on Nov 4, 2022

No, I'm merely suggesting that they are not a solution to the problem, and that the fundamental issue has to be approached at the language level.

What you are building is mainly a smoke detector (and maybe a bit of a sprinkler if it takes some decisions itself), not fireproof doors (that only at install, not test- or runtime). Smoke detectors by themselves cannot prevent fires from spreading and are not completely reliable.

Analysis tools are still useful - even with perfect language-level access and resource control, packages which are given many required permissions may behave maliciously (e.g. through compromise of any component in the development or distribution pipeline), or return malicious data (which is out of scope/unsolvable at the language level). Both approaches complement each other nicely.