Headline

GHSA-9493-h29p-rfm2: runc container escape via "masked path" abuse due to mount race conditions

Impact

The OCI runtime specification has a maskedPaths feature that allows for files or directories to be “masked” by placing a mount on top of them to conceal their contents. This is primarily intended to protect against privileged users in non-user-namespaced from being able to write to files or access directories that would either provide sensitive information about the host to containers or allow containers to perform destructive or other privileged operations on the host (examples include /proc/kcore, /proc/timer_list, /proc/acpi, and /proc/keys).

maskedPaths can be used to either mask a directory or a file – directories are masked using a new read-only tmpfs instance that is mounted on top of the masked path, while files are masked by bind-mounting the container’s /dev/null on top of the masked path.

In all known versions of runc, when using the container’s /dev/null to mask files, runc would not perform sufficient verification that the source of the bind-mount (i.e., the container’s /dev/null) was actually a real /dev/null inode. While /dev/null is usually created by runc when doing container creation, it is possible for an attacker to create a /dev/null or modify the /dev/null inode created by runc through race conditions with other containers sharing mounts (runc has also verified this attack is possible to exploit using a standard Dockerfile with docker buildx build as that also permits triggering parallel execution of containers with custom shared mounts configured).

This could lead to two separate issues:

Attack 1: Arbitrary Mount Gadget (leading to Host Information Disclosure, Host Denial of Service, or Container Escape)

By replacing /dev/null with a symlink to an attacker-controlled path, an attacker could cause runc to bind-mount an arbitrary source path to a path inside the container. This could lead to:

Host Denial of Service: By bind-mounting files such as /proc/sysrq-trigger, the attacker can gain access to a read-write version of files which can be destructive to write to (/proc/sysrq-trigger would allow an attacker to trigger a kernel panic, shutting down the machine, or causing the machine to freeze without rebooting).
Container Escape: By bind-mounting /proc/sys/kernel/core_pattern, the attacker can reconfigure a coredump helper – as kernel upcalls are not namespaced, the configured binary (which could be a container binary or a host binary with a malicious command-line) will run with full privileges on the host system. Thus, the attacker can simply trigger a coredump and gain complete root privileges over the host.

Note that while config.json allows users to bind-mount arbitrary paths (and thus an attacker that can modify config.json arbitrarily could gain the same access as this exploit), because maskedPaths is applied by almost all higher-level container runtimes (and thus provides a guaranteed mount source) this flaw effectively allows any attacker that can spawn containers (with some degree of control over what kinds of containers are being spawned) to achieve the above goals.

Attack 2: Bypassing `maskedPaths`

While investigating Attack 1, runc discovered that the runc validation mechanism when bind-mounting /dev/null for maskedPaths would ignore ENOENT errors – meaning that if an attacker deleted /dev/null before runc did the bind-mount, runc would silently skip applying maskedPaths for the container. (The original purpose of this ENOENT-ignore behaviour was to permit configurations where maskedPaths references non-existent files, but runc did not consider that the source path could also not exist in this kind of race-attack scenario.)

With maskedPaths rendered inoperative, an attacker would be able to access sensitive host information from files in /proc that would usually be masked (such as /proc/kcore). However, note that /proc/sys and /proc/sysrq-trigger are mounted read-only rather than being masked with files, so this attack variant will not allow the same breakout or host denial of service attacks as in Attack 1.

Patches

This advisory is being published as part of a set of three advisories:

CVE-2025-31133
CVE-2025-52881
CVE-2025-52565

The patches fixing this issue have accordingly been combined into a single patchset. The following patches from that patchset resolve the issues in this advisory:

db19bbed5348 (“internal/sys: add VerifyInode helper”)
8476df83b534 (“libct: add/use isDevNull, verifyDevNull”)
1a30a8f3d921 (“libct: maskPaths: only ignore ENOENT on mount dest”)
5d7b24240724 (“libct: maskPaths: don’t rely on ENOTDIR for mount”)

runc 1.2.8, 1.3.3, and 1.4.0-rc.3 have been released and all contain fixes for these issues. As per runc’s new release model, runc 1.1.x and earlier are no longer supported and thus have not been patched. https://github.com/opencontainers/runc/blob/v1.4.0-rc.2/RELEASES.md

Mitigations

Use containers with user namespaces (with the host root user not mapped into the container’s user namespace). This will block most of the most serious aspects of these attacks, as the procfs files used for the container breakout use Unix DAC permissions and user namespaced users will not have access to the relevant files.

runc would also like to take this opportunity to re-iterate that runc strongly recommend all users use user namespaced containers. They have proven to be one of the best security hardening mechanisms against container breakouts, and the kernel applies additional restrictions to user namespaced containers above and beyond the user remapping functionality provided. With the advent of id-mapped mounts (Linux 5.12), there is very little reason to not use user namespaces for most applications. Note that using user namespaces to configure your container does not mean you have to enable unprivileged user namespace creation inside the container – most container runtimes apply a seccomp-bpf profile which blocks unshare(CLONE_NEWUSER) inside containers regardless of whether the container itself uses user namespaces.

Rootless containers can provide even more protection if your configuration can use them – by having runc itself be an unprivileged process, in general you would expect the impact scope of a runc bug to be less severe as it would only have the privileges afforded to the host user which spawned runc.
For non-user namespaced containers, configure all containers you spawn to not permit processes to run with root privileges. In most cases this would require configuring the container to use a non-root user and enabling noNewPrivileges to disable any setuid or set-capability binaries. (Note that this is runc’s general recommendation for a secure container setup – it is very difficult, if not impossible, to run an untrusted program with root privileges safely.) If you need to use ping in your containers, there is a net.ipv4.ping_group_range sysctl that can be used to allow unprivileged users to ping without requiring setuid or set-capability binaries.
Do not run untrusted container images from unknown or unverified sources.
Depending on the configuration of maskedPaths, an AppArmor profile (such as the default one applied by higher level runtimes including Docker and Podman) can block write attempts to most of /proc and /sys. This means that even with a procfs file maliciously bind-mounted to a maskedPaths target, all of the targets of maskedPaths in the default configuration of runtimes such as Docker or Podman will still not permit write access to said files. However, if a container is configured with a maskedPaths that is not protected by AppArmor then the same attack can be carried out. Please note that CVE-2025-52881 allows an attacker to bypass LSM labels, and so this mitigation is not that helpful when considered in combination with CVE-2025-52881.
Based on runc’s analysis, SELinux policies have a limited effect when trying to protect against this attack. The reason is that the /dev/null bind-mount gets implicitly relabelled with context=... set to the container’s SELinux context, and thus the container process will have access to the source of the bind-mount even if they otherwise wouldn’t.
https://github.com/opencontainers/runc/security/advisories/GHSA-cgrx-mc8f-2prm

Other Runtimes

As this vulnerability boils down to a fairly easy-to-make logic bug, runc has provided information to other OCI (crun, youki) and non-OCI (LXC) container runtimes about this vulnerability. Based on discussions with other runtimes, it seems that crun and youki may have similar security issues and will release a coordinated security release along with runc. LXC appears to also be vulnerable in some aspects, but their security stance is (understandably) that non-user-namespaced containers are fundamentally insecure by design.
https://linuxcontainers.org/lxc/security/

Credits

Thanks to Lei Wang (@ssst0n3 from Huawei) for finding and reporting the original vulnerability (Attack 1), and Li Fubang (@lifubang from acmcoder.com, CIIC) for discovering another attack vector (Attack 2) based on @ssst0n3’s initial findings.

1 month ago

ghsa

Open in Source

#vulnerability #mac #linux #dos #js #git #huawei #docker

Impact

The OCI runtime specification has a maskedPaths feature that allows for files or directories to be “masked” by placing a mount on top of them to conceal their contents. This is primarily intended to protect against privileged users in non-user-namespaced from being able to write to files or access directories that would either provide sensitive information about the host to containers or allow containers to perform destructive or other privileged operations on the host (examples include /proc/kcore, /proc/timer_list, /proc/acpi, and /proc/keys).

maskedPaths can be used to either mask a directory or a file – directories are masked using a new read-only tmpfs instance that is mounted on top of the masked path, while files are masked by bind-mounting the container’s /dev/null on top of the masked path.

In all known versions of runc, when using the container’s /dev/null to mask files, runc would not perform sufficient verification that the source of the bind-mount (i.e., the container’s /dev/null) was actually a real /dev/null inode. While /dev/null is usually created by runc when doing container creation, it is possible for an attacker to create a /dev/null or modify the /dev/null inode created by runc through race conditions with other containers sharing mounts (runc has also verified this attack is possible to exploit using a standard Dockerfile with docker buildx build as that also permits triggering parallel execution of containers with custom shared mounts configured).

This could lead to two separate issues:

Attack 1: Arbitrary Mount Gadget (leading to Host Information Disclosure, Host Denial of Service, or Container Escape)

By replacing /dev/null with a symlink to an attacker-controlled path, an attacker could cause runc to bind-mount an arbitrary source path to a path inside the container. This could lead to:

Host Denial of Service: By bind-mounting files such as /proc/sysrq-trigger, the attacker can gain access to a read-write version of files which can be destructive to write to (/proc/sysrq-trigger would allow an attacker to trigger a kernel panic, shutting down the machine, or causing the machine to freeze without rebooting).
Container Escape: By bind-mounting /proc/sys/kernel/core_pattern, the attacker can reconfigure a coredump helper – as kernel upcalls are not namespaced, the configured binary (which could be a container binary or a host binary with a malicious command-line) will run with full privileges on the host system. Thus, the attacker can simply trigger a coredump and gain complete root privileges over the host.

Note that while config.json allows users to bind-mount arbitrary paths (and thus an attacker that can modify config.json arbitrarily could gain the same access as this exploit), because maskedPaths is applied by almost all higher-level container runtimes (and thus provides a guaranteed mount source) this flaw effectively allows any attacker that can spawn containers (with some degree of control over what kinds of containers are being spawned) to achieve the above goals.

Attack 2: Bypassing maskedPaths

While investigating Attack 1, runc discovered that the runc validation mechanism when bind-mounting /dev/null for maskedPaths would ignore ENOENT errors – meaning that if an attacker deleted /dev/null before runc did the bind-mount, runc would silently skip applying maskedPaths for the container. (The original purpose of this ENOENT-ignore behaviour was to permit configurations where maskedPaths references non-existent files, but runc did not consider that the source path could also not exist in this kind of race-attack scenario.)

With maskedPaths rendered inoperative, an attacker would be able to access sensitive host information from files in /proc that would usually be masked (such as /proc/kcore). However, note that /proc/sys and /proc/sysrq-trigger are mounted read-only rather than being masked with files, so this attack variant will not allow the same breakout or host denial of service attacks as in Attack 1.

Patches

This advisory is being published as part of a set of three advisories:

CVE-2025-31133
CVE-2025-52881
CVE-2025-52565

The patches fixing this issue have accordingly been combined into a single patchset. The following patches from that patchset resolve the issues in this advisory:

db19bbed5348 (“internal/sys: add VerifyInode helper”)
8476df83b534 (“libct: add/use isDevNull, verifyDevNull”)
1a30a8f3d921 (“libct: maskPaths: only ignore ENOENT on mount dest”)
5d7b24240724 (“libct: maskPaths: don’t rely on ENOTDIR for mount”)

Mitigations

Use containers with user namespaces (with the host root user not mapped into the container’s user namespace). This will block most of the most serious aspects of these attacks, as the procfs files used for the container breakout use Unix DAC permissions and user namespaced users will not have access to the relevant files.

runc would also like to take this opportunity to re-iterate that runc strongly recommend all users use user namespaced containers. They have proven to be one of the best security hardening mechanisms against container breakouts, and the kernel applies additional restrictions to user namespaced containers above and beyond the user remapping functionality provided. With the advent of id-mapped mounts (Linux 5.12), there is very little reason to not use user namespaces for most applications. Note that using user namespaces to configure your container does not mean you have to enable unprivileged user namespace creation inside the container – most container runtimes apply a seccomp-bpf profile which blocks unshare(CLONE_NEWUSER) inside containers regardless of whether the container itself uses user namespaces.

Rootless containers can provide even more protection if your configuration can use them – by having runc itself be an unprivileged process, in general you would expect the impact scope of a runc bug to be less severe as it would only have the privileges afforded to the host user which spawned runc.
For non-user namespaced containers, configure all containers you spawn to not permit processes to run with root privileges. In most cases this would require configuring the container to use a non-root user and enabling noNewPrivileges to disable any setuid or set-capability binaries. (Note that this is runc’s general recommendation for a secure container setup – it is very difficult, if not impossible, to run an untrusted program with root privileges safely.) If you need to use ping in your containers, there is a net.ipv4.ping_group_range sysctl that can be used to allow unprivileged users to ping without requiring setuid or set-capability binaries.
Do not run untrusted container images from unknown or unverified sources.
Depending on the configuration of maskedPaths, an AppArmor profile (such as the default one applied by higher level runtimes including Docker and Podman) can block write attempts to most of /proc and /sys. This means that even with a procfs file maliciously bind-mounted to a maskedPaths target, all of the targets of maskedPaths in the default configuration of runtimes such as Docker or Podman will still not permit write access to said files. However, if a container is configured with a maskedPaths that is not protected by AppArmor then the same attack can be carried out. Please note that CVE-2025-52881 allows an attacker to bypass LSM labels, and so this mitigation is not that helpful when considered in combination with CVE-2025-52881.
Based on runc’s analysis, SELinux policies have a limited effect when trying to protect against this attack. The reason is that the /dev/null bind-mount gets implicitly relabelled with context=… set to the container’s SELinux context, and thus the container process will have access to the source of the bind-mount even if they otherwise wouldn’t.
GHSA-cgrx-mc8f-2prm

Other Runtimes

Credits

References

GHSA-9493-h29p-rfm2
opencontainers/runc@1a30a8f
opencontainers/runc@5d7b242
opencontainers/runc@8476df8
opencontainers/runc@db19bbe

Related news

November Linux Patch Wednesday

November Linux Patch Wednesday. In November, Linux vendors began fixing 516 vulnerabilities, one and a half times fewer than in October. Of these, 232 are in the Linux Kernel. One vulnerability is exploited in the wild: 🔻 MemCor – Chromium (CVE-2025-13223). Added to CISA KEV on November 19. For 64 more vulnerabilities, public or suspected […]

1 month ago

Alexander V. Leonov

Open in Source

#sql #vulnerability #web #linux #apache #js #samba #auth #chrome #blog

⚡ Weekly Recap: Hyper-V Malware, Malicious AI Bots, RDP Exploits, WhatsApp Lockdown and More

Cyber threats didn’t slow down last week—and attackers are getting smarter. We’re seeing malware hidden in virtual machines, side-channel leaks exposing AI chats, and spyware quietly targeting Android devices in the wild. But that’s just the surface. From sleeper logic bombs to a fresh alliance between major threat groups, this week’s roundup highlights a clear shift: cybercrime is evolving fast