"value":"In the Linux kernel, the following vulnerability has been resolved:\n\nuserfaultfd: fix checks for huge PMDs\n\nPatch series \"userfaultfd: fix races around pmd_trans_huge() check\", v2.\n\nThe pmd_trans_huge() code in mfill_atomic() is wrong in three different\nways depending on kernel version:\n\n1. The pmd_trans_huge() check is racy and can lead to a BUG_ON() (if you hit\n the right two race windows) - I've tested this in a kernel build with\n some extra mdelay() calls. See the commit message for a description\n of the race scenario.\n On older kernels (before 6.5), I think the same bug can even\n theoretically lead to accessing transhuge page contents as a page table\n if you hit the right 5 narrow race windows (I haven't tested this case).\n2. As pointed out by Qi Zheng, pmd_trans_huge() is not sufficient for\n detecting PMDs that don't point to page tables.\n On older kernels (before 6.5), you'd just have to win a single fairly\n wide race to hit this.\n I've tested this on 6.1 stable by racing migration (with a mdelay()\n patched into try_to_migrate()) against UFFDIO_ZEROPAGE - on my x86\n VM, that causes a kernel oops in ptlock_ptr().\n3. On newer kernels (>=6.5), for shmem mappings, khugepaged is allowed\n to yank page tables out from under us (though I haven't tested that),\n so I think the BUG_ON() checks in mfill_atomic() are just wrong.\n\nI decided to write two separate fixes for these (one fix for bugs 1+2, one\nfix for bug 3), so that the first fix can be backported to kernels\naffected by bugs 1+2.\n\n\nThis patch (of 2):\n\nThis fixes two issues.\n\nI discovered that the following race can occur:\n\n mfill_atomic other thread\n ============ ============\n <zap PMD>\n pmdp_get_lockless() [reads none pmd]\n <bail if trans_huge>\n <if none:>\n <pagefault creates transhuge zeropage>\n __pte_alloc [no-op]\n <zap PMD>\n <bail if pmd_trans_huge(*dst_pmd)>\n BUG_ON(pmd_none(*dst_pmd))\n\nI have experimentally verified this in a kernel with extra mdelay() calls;\nthe BUG_ON(pmd_none(*dst_pmd)) triggers.\n\nOn kernels newer than commit 0d940a9b270b (\"mm/pgtable: allow\npte_offset_map[_lock]() to fail\"), this can't lead to anything worse than\na BUG_ON(), since the page table access helpers are actually designed to\ndeal with page tables concurrently disappearing; but on older kernels\n(<=6.4), I think we could probably theoretically race past the two\nBUG_ON() checks and end up treating a hugepage as a page table.\n\nThe second issue is that, as Qi Zheng pointed out, there are other types\nof huge PMDs that pmd_trans_huge() can't catch: devmap PMDs and swap PMDs\n(in particular, migration PMDs).\n\nOn <=6.4, this is worse than the first issue: If mfill_atomic() runs on a\nPMD that contains a migration entry (which just requires winning a single,\nfairly wide race), it will pass the PMD to pte_offset_map_lock(), which\nassumes that the PMD points to a page table.\n\nBreakage follows: First, the kernel tries to take the PTE lock (which will\ncrash or maybe worse if there is no \"struct page\" for the address bits in\nthe migration entry PMD - I think at least on X86 there usually is no\ncorresponding \"struct page\" thanks to the PTE inversion mitigation, amd64\nlooks different).\n\nIf that didn't crash, the kernel would next try to write a PTE into what\nit wrongly thinks is a page table.\n\nAs part of fixing these issues, get rid of the check for pmd_trans_huge()\nbefore __pte_alloc() - that's redundant, we're going to have to check for\nthat after the __pte_alloc() anyway.\n\nBackport note: pmdp_get_lockless() is pmd_read_atomic() in older kernels."