cve/2025/CVE-2025-37964.md
2025-09-29 21:09:30 +02:00

3.5 KiB

CVE-2025-37964

Description

In the Linux kernel, the following vulnerability has been resolved:x86/mm: Eliminate window where TLB flushes may be inadvertently skippedtl;dr: There is a window in the mm switching code where the new CR3 isset and the CPU should be getting TLB flushes for the new mm. Butshould_flush_tlb() has a bug and suppresses the flush. Fix it bywidening the window where should_flush_tlb() sends an IPI.Long Version:=== History ===There were a few things leading up to this.First, updating mm_cpumask() was observed to be too expensive, so it wasmade lazier. But being lazy caused too many unnecessary IPIs to CPUsdue to the now-lazy mm_cpumask(). So code was added to cullmm_cpumask() periodically[2]. But that culling was a bit too aggressiveand skipped sending TLB flushes to CPUs that need them. So here we areagain.=== Problem ===The too-aggressive code in should_flush_tlb() strikes in this window: // Turn on IPIs for this CPU/mm combination, but only // if should_flush_tlb() agrees: cpumask_set_cpu(cpu, mm_cpumask(next)); next_tlb_gen = atomic64_read(&next->context.tlb_gen); choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush); load_new_mm_cr3(need_flush); // ^ After 'need_flush' is set to false, IPIs MUST // be sent to this CPU and not be ignored. this_cpu_write(cpu_tlbstate.loaded_mm, next); // ^ Not until this point does should_flush_tlb() // become true!should_flush_tlb() will suppress TLB flushes between load_new_mm_cr3()and writing to 'loaded_mm', which is a window where they should not besuppressed. Whoops.=== Solution ===Thankfully, the fuzzy "just about to write CR3" window is already markedwith loaded_mm==LOADED_MM_SWITCHING. Simply checking for that state inshould_flush_tlb() is sufficient to ensure that the CPU is targeted withan IPI.This will cause more TLB flush IPIs. But the window is relatively smalland I do not expect this to cause any kind of measurable performanceimpact.Update the comment where LOADED_MM_SWITCHING is written since it grewyet another user.Peter Z also raised a concern that should_flush_tlb() might not observe'loaded_mm' and 'is_lazy' in the same order that switch_mm_irqs_off()writes them. Add a barrier to ensure that they are observed in theorder they are written.

POC

Reference

No PoCs from references.

Github