Container vulnerability scanning leaves security teams in a tough place. On the one hand, some argue that container scanning has been commoditized due to numerous open source projects that support vulnerability scanning as a feature. On the other hand, they’re so annoying to deal with that companies like Chainguard can succeed just by saying they make them go away (the truth is more complicated).
The confusion stems from a core reality: finding vulnerabilities is a million times easier than fixing them. In this article, we’ll discuss why container vulnerabilities are so hard to fix, before addressing the critical information needed to prioritize and remediate them.
Why Container Vulnerabilities are an Issue
When we talk about container vulnerabilities, we’re mostly talking about operating system vulnerabilities, but people don’t treat them that way. Building a container is like building a virtual machine. You take your golden image, do some customization to it, and deploy it out to production. Unlike Windows operating systems, once an image is deployed, there’s typically not any “auto-updating” done, which is what creates vulnerabilities to be remediated.
This means two things:
It’s not always clear where a vulnerable package is coming from
The context of how you use that package is usually not exploitable
Overall, developers get really frustrated because they’re asked to fix findings which are not exploitable, and it’s unclear what to even do to fix them. Do I need a new base image? Do I need to change a line of code? Do I need to just redeploy?
Without more information about the container itself, how it’s built, and the context of its deployment, vulnerability scanning leads to a never ending trail of unfixable and unexploitable findings.
How to Prioritize
These days, every solution is offering the silver bullet to “risk based prioritization,” but without runtime Kubernetes context, many vendors are relying solely on vulnerability data to try and create their prioritization. Security teams need contextual data points that go beyond KEV and EPSS.
In general, there are three vulnerability prioritization methods at work in containers:
Is the package in use?
Is there an attack path to the package from the internet?
How many workloads are running the vulnerability?
First, many container images contain packages that aren’t used by the container when it’s running. In this article, I’m talking about in-use only in terms of the package being loaded into memory, but in the future I’ll break down other meanings that make even this idea hard to dissect. These unused packages are left over artifacts either from operating system utilities or dependencies used only for building or testing images. To put it succinctly, if a dependency isn’t running, then it can’t be part of an exploit unless an attacker is already inside a system, making it a lower priority.
Second, while patching is still important for defense in depth, most exploits happen when packages are publicly facing. These are the critical zero days security teams worry about patching, because it’s a guarantee that end users can hit the endpoints. I’ve written elsewhere about the pros and cons of network reachability - but in sum it’s an important data point, not a silver bullet.
Finally, deciding what vulnerabilities to patch also comes down to a realistic “bang for your buck” approach. Developer time is limited, so if you’re prioritizing which of thousands of vulnerabilities to tackle, it makes sense to patch the images that are most widely distributed to most significantly reduce your attack surface.
A quick word on static reachability like what Endor, Backslash, Aikido, and others provide - this data is especially applicable for code library vulnerabilities, where you can look at the code to try and tell if a vulnerable function is getting called by your code. This logic is less applicable to containers though, because system OS libraries require at least some sort of simulation to tell if they’re loading or not. Even on the SCA side, in my testing, there is a real trade off between when you’re scanning and the accuracy of the de-prioritization. For example, in my test repo, I have some scripts that aren’t actually used anywhere - only runtime tools can tell they’re not used.
On their own, these three prioritization data points hold limited value, but when put together security teams are able to drive an effective prioritization program. For example, internet facing has a wide variety of meanings, so while it’s useful as a data point for vulnerability prioritization, it needs additional context to fully prioritize. In sum, the more data the better - as business needs will constantly fluctuate, causing different prioritizations along the way.
How to Fix
Once a developer gets a prioritized, fixable ticket they can action, there are three possible actions that might happen:
Simply re-building the container inherits all available patches
They need to swap or upgrade base images
They need to change a line of code in the Dockerfile to upgrade a package
To make this decision, tools can help by providing some key information. First, developers need to know what layer of the image the vulnerability came in from. Take this simple example from a node-js base image, node:16
In this image, I use the base Nodejs image, and then npm install my packages. Let’s say I get a vulnerability finding: high vulnerability with lodash
Without more information, I’m dead in the water trying to figure out where this is coming from. Unless I’ve memorized every dependency, direct and transitive, I’m going to have to start researching possible ways this got into my image. But if I have the layer information:
I can see that it comes in when I run npm install, telling me this is an npm dependency I need to go and update. Similarly, if it’s a critical enough vulnerability, the image hierarchy at the top tells me how the base node image itself is constructed, telling me exactly what is and isn’t in my control to fix.
Layer information is so critical I’d struggle to use any tool that didn’t have it, and the few times I’ve been forced to have been monumental struggles. In the above example, I can at least see where a vulnerability is coming from - whether a base image, an ubuntu package I’m installing, or my npm install. Without this information, I’d just be dead in the water trying to hunt down how to fix something.
Second, it’s important that scanners offer multiple places to scan so I can quickly validate my findings. Let’s say I suspect a simple rebuild will fix my issues, the first thing I’ll do is build the container locally. If I have to first push that container all the way out to production, it creates a painful feedback loop to try and see if I’ve fixed the issue. I’d love for a tool to be able to detect if a rebuild would fix the vulnerability, as most developers I’ve worked with don’t realize this is all that’s needed to fix a finding.
Finally, if I need to upgrade base images, it’s helpful if the tool lets me know what versions of the image are available, and what the vulnerability statuses on those versions are. It’s often the case that upgrading versions can actually increase vulnerability counts. Note: base images are often patched without changing version numbers!
Overall, the problems with container vulnerability scanning are twofold: not enough data to prioritize, and not enough data to fix. On the prioritization side, a lot of contextual information is needed to drive effective prioritization, and to be honest every company will choose slightly different metrics based on their specific application. In general, some kind of contextual, runtime data is needed on this side.
On the fix side, it’s important to have the image layers, fix versions, and upgrade paths available. The thing that would be most helpful but no one has yet is just letting me know if a simple rebuild would fix a vulnerability as opposed to needing to manually change anything!
Tools for container vulnerability scanning
Tools for vulnerability management and contextualization
This post was completed in collaboration with ARMO