Hardening the Linux Kernel: Defense in Depth Against Privilege Escalation

Copy Fail demonstrated that Linux kernel privilege escalation flaws can sit undetected for nearly a decade. The Kernel Self Protection Project provides a systematic hardening baseline that raises the cost of exploitation across entire vulnerability classes — not just individual CVEs.

Why Kernel Hardening Matters

CVE-2026-31431 — "Copy Fail" — is a reminder that kernel privilege escalation vulnerabilities can hide for nearly a decade in code that passed review. A 2017 in-place processing shortcut in algif_aead went undetected until an AI-assisted scanner found it in 2026. By then, every major Linux distribution was vulnerable, a working 732-byte exploit existed on the day of disclosure, and defenders had essentially no window before reliable exploitation became possible by any threat actor. It is the third in a lineage of page-cache overwrite exploits: CVE-2016-5195 (Dirty COW) used a race condition, CVE-2022-0847 (Dirty Pipe) eliminated the race, and Copy Fail demonstrated the same primitive via an entirely different kernel subsystem — the crypto API — four years later.

The standard response — blacklist the module, apply the vendor patch — is correct but reactive. It addresses one vulnerability at a time. The Kernel Self Protection Project (KSPP) takes a different approach: make the kernel harder to exploit by default, reducing the impact of the bugs that have not been found yet.

What Is the Kernel Self Protection Project?

Founded in 2015, KSPP's premise is that kernel bugs have long lifetimes and that the kernel must be designed with built-in defensive technologies to protect against flaws before they are discovered. Its goal is not to find individual bugs but to eliminate entire classes of vulnerabilities and block the exploitation techniques that attackers rely on across many different CVEs.

KSPP features are upstream patches — when enabled at build time, they protect all kernel code including out-of-tree vendor modules. The project covers five broad areas:

  • Attack surface reduction — disabling or restricting interfaces that provide unnecessary exposure
  • Memory integrity — detecting corruption of stack, heap, and data structures at runtime
  • Address space randomization — making kernel layout unpredictable across boots and syscalls
  • Memory permission enforcement — preventing code modification and data execution
  • Information leak prevention — masking kernel addresses from userspace

Attack Surface Reduction

Copy Fail required AF_ALG AEAD sockets. If that interface is unavailable, the exploit has no path to the vulnerable code. Attack surface reduction is the most direct mitigation class for this category of vulnerability and should be the first area addressed.

Why Userspace Detection Fails

The immediate response to Copy Fail from many security vendors was signature detection — scanning for the known Python proof-of-concept code. This catches unsophisticated attackers copy-pasting the published PoC, but it addresses the delivery wrapper, not the vulnerability. The exploit's mechanism is a compressed binary payload written directly into kernel memory via an AF_ALG socket. That payload can be delivered by any language, compiled into a standalone binary, or staged in pieces — none of which a Python signature would catch. More importantly, if the vulnerable kernel interface is reachable, the calling language is irrelevant. Disabling the interface is the only mitigation that holds regardless of how the exploit is packaged.

Compile-Time: Remove Unnecessary Crypto API Interfaces

CONFIG_CRYPTO_USER_API_AEAD is not set

Most general-purpose servers have no application that uses the kernel's userspace AEAD crypto interface. Disabling it at compile time eliminates the attack surface entirely — no module to blacklist, no runtime check required. Audit your workloads before disabling; legitimate users are rare but include some HSM and crypto benchmarking tools.

If you cannot recompile, the =m (module) form is the next best posture — it can be blacklisted at runtime (see the Copy Fail workaround). The worst posture is =y (built-in), which cannot be removed without a kernel rebuild.

Compile-Time: Disable Other High-Risk Interfaces

KSPP identifies several interfaces that are over-represented in privilege escalation exploit chains:

CONFIG_LDISC_AUTOLOAD is not set      # tty line discipline autoloading
CONFIG_LEGACY_TIOCSTI is not set      # keystroke injection via ioctl
CONFIG_MODIFY_LDT_SYSCALL is not set  # x86 LDT manipulation
CONFIG_X86_VSYSCALL_EMULATION is not set  # legacy vsyscall
CONFIG_BINFMT_MISC is not set         # miscellaneous binary format support
CONFIG_KEXEC is not set               # kernel replacement
CONFIG_HIBERNATION is not set         # resume-based attack vector

Runtime: Disable Module Loading After Boot

Once a system has finished initializing, no new modules should be needed in steady state:

kernel.modules_disabled = 1

This is a one-way door — once set it cannot be reversed without a reboot. Apply it in a late-boot service after all required modules have loaded. With this sysctl active, even a module blacklist misconfiguration is moot: the kernel refuses to load anything.

Runtime: Restrict Unprivileged Interfaces

kernel.unprivileged_bpf_disabled = 1    # eBPF requires CAP_BPF
kernel.perf_event_paranoid = 3          # perf_event_open requires CAP_SYS_ADMIN
vm.unprivileged_userfaultfd = 0         # userfaultfd requires CAP_SYS_PTRACE
user.max_user_namespaces = 0            # disable user namespaces if not needed

Each of these restricts an interface that has appeared repeatedly in local privilege escalation exploit chains. kernel.perf_event_paranoid = 3 directly mitigates CVE-2013-2094, where an unchecked 64-bit index into a fixed-size array gave unprivileged users a kernel memory decrement primitive with no prerequisites. user.max_user_namespaces = 0 blocks the escalation path used by CVE-2021-3493 (overlayfs capability bypass) and CVE-2022-0185 (fsconfig heap overflow) — both of which required only the ability to create a user namespace to reach privileged kernel code. It is particularly impactful — user namespaces are used by many container escapes and LPE chains — but will break rootless containers (Podman, Docker rootless mode). Evaluate against your workload before applying.

Memory Integrity

Copy Fail's root cause was a scatterlist operation that wrote into page-cache pages it should not have touched. Several KSPP memory integrity features are designed to detect exactly this class of out-of-bounds write during development and at runtime.

Heap and Stack Initialization

CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y    # zero all heap allocations
CONFIG_INIT_ON_FREE_DEFAULT_ON=y     # poison freed memory
CONFIG_INIT_STACK_ALL_ZERO=y         # zero stack variables on function entry

Uninitialized memory is a persistent source of information leaks and use-after-free exploitation. These options add modest overhead (typically 1–5% on memory-intensive workloads) but eliminate the uninitialized-data information leak class entirely and make use-after-free exploits harder by poisoning freed regions before reuse.

Heap Hardening

CONFIG_SLAB_FREELIST_RANDOM=y        # randomize heap freelist order
CONFIG_SLAB_FREELIST_HARDENED=y      # harden heap metadata pointers
CONFIG_RANDOM_KMALLOC_CACHES=y       # per-boot randomized kmalloc caches
CONFIG_DEBUG_SG=y                    # validate scattergather list integrity
CONFIG_DEBUG_LIST=y                  # validate linked list operations

CONFIG_DEBUG_SG=y is directly relevant to Copy Fail — it performs integrity checks on scatterlist operations of the kind that allowed the page-cache write. Enabling it in production has low overhead and would produce a detectable kernel warning on an active exploit attempt. CONFIG_SLAB_FREELIST_RANDOM and CONFIG_SLAB_FREELIST_HARDENED raise the cost of heap spray attacks: CVE-2022-0185 (fsconfig heap overflow) required precise placement of controlled objects adjacent to the overflowed allocation — slab randomization directly disrupts that technique.

Bounds Checking

CONFIG_FORTIFY_SOURCE=y              # compile-time and runtime string/memory checks
CONFIG_UBSAN_BOUNDS=y                # undefined-behavior bounds checking
CONFIG_KFENCE=y                      # sampling-based heap overflow detector

CONFIG_KFENCE (Kernel Electric Fence) is a low-overhead sampling-based memory safety tool. It randomly places allocations in guarded pages and reports out-of-bounds accesses. It is probabilistic rather than deterministic — it would not catch every Copy Fail exploitation attempt — but it generates kernel splats during exploit development and fuzzing, making novel vulnerability research significantly harder.

Address Space Randomization

Randomization does not prevent exploitation but raises its cost substantially by breaking hardcoded address assumptions.

CONFIG_RANDOMIZE_BASE=y                    # randomize kernel text base (KASLR)
CONFIG_RANDOMIZE_MEMORY=y                  # randomize dynamic memory layout
CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT=y   # randomize stack offset per syscall
CONFIG_SHUFFLE_PAGE_ALLOCATOR=y            # randomize page allocator freelist

CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT (available since kernel 5.13) randomizes the kernel stack offset on every syscall entry. It directly targets stack-layout-dependent exploitation techniques and has near-zero performance impact.

Kernel command line parameters to pair with the above:

randomize_kstack_offset=on
page_alloc.shuffle=1
slab_nomerge

slab_nomerge prevents the kernel from merging similar slab caches, which would otherwise allow an attacker to place target objects adjacent to attacker-controlled allocations.

Memory Permission Enforcement

CONFIG_STRICT_KERNEL_RWX=y      # kernel memory: W XOR X enforced
CONFIG_STRICT_MODULE_RWX=y      # same for loadable modules
CONFIG_DEBUG_WX=y               # report any W+X mappings at boot

W^X (Write XOR Execute) enforcement ensures that no kernel memory page is simultaneously writable and executable. An attacker who can write to kernel memory cannot directly execute it — they must also find or construct an executable payload elsewhere. Copy Fail's page-cache corruption approach bypasses this by targeting the userspace setuid binary cache rather than kernel code pages directly, but W^X is foundational: it increases the cost of turning arbitrary writes into reliable code execution across the broad landscape of kernel LPEs.

Information Leak Prevention

Kernel pointer leaks frequently appear as a prerequisite step in multi-stage exploits: leak an address, defeat KASLR, then exploit. Locking down address exposure denies attackers the information gathering phase.

CONFIG_SECURITY_DMESG_RESTRICT=y     # restrict dmesg to CAP_SYSLOG

Runtime sysctls:

kernel.kptr_restrict = 2             # mask all kernel pointers in /proc and /sys
kernel.dmesg_restrict = 1            # restrict dmesg to CAP_SYSLOG

With kptr_restrict = 2, kernel addresses printed via %p in /proc, /sys, and kernel logs are replaced with (ptrval). Note that kptr_restrict = 1 still leaks addresses to root; = 2 is required to fully mask them from all callers.

Kernel Hardening Checker

Rather than manually auditing CONFIG_* options, use Alexander Popov's kernel-hardening-checker tool. It checks a running kernel's configuration against the full KSPP recommended settings and produces a scored report of which protections are active, which are missing, and why each matters:

git clone https://github.com/a13xp0p0v/kernel-hardening-checker
cd kernel-hardening-checker
python3 kernel_hardening_checker/cli.py -c /boot/config-$(uname -r)

Run it against your current kernel config before a hardening project to establish a baseline, and after to verify that target settings took effect. The output distinguishes between settings that are missing, explicitly disabled, or not applicable to your architecture.

Copy Fail Controls at a Glance

The table below maps KSPP controls to their specific relevance for the Copy Fail class of exploit:

Control Type Relevance
CONFIG_CRYPTO_USER_API_AEAD is not set Compile-time Eliminates the attack surface entirely
kernel.modules_disabled = 1 Runtime sysctl Prevents loading the vulnerable module post-boot
install algif_aead /bin/false modprobe blacklist Blocks module loading on unpatched =m kernels
CONFIG_DEBUG_SG=y Compile-time Validates scatterlist integrity — detects the corrupt write
CONFIG_KFENCE=y Compile-time Probabilistic detection during exploit development
kernel.unprivileged_bpf_disabled = 1 Runtime sysctl Reduces available primitives for exploit chaining
CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT=y Compile-time Raises cost of stack-layout-dependent follow-on stages
Seccomp profile blocking socket(AF_ALG, ...) Container policy Mitigation where module blacklist is unavailable

No single control eliminates all risk from all kernel LPEs. The value of the KSPP baseline is that multiple overlapping controls must all fail simultaneously for an attacker to succeed.

Getting Started

  1. Run kernel-hardening-checker against your current kernel to establish a gap baseline.
  2. Identify which settings your workloads prevent — not all controls are compatible with all use cases.
  3. Apply attack surface reduction settings first — they have the highest impact and fewest compatibility risks.
  4. Apply runtime sysctls in a late-boot service via /etc/sysctl.d/.
  5. Track the KSPP Feature List as you update kernels — new protections ship with each release.

Kernel hardening is not a one-time event. The KSPP feature list grows with each kernel release, and the threat landscape shifts as attackers find ways around existing defenses. Treat it as a continuous baseline, not a checkbox.