29 lines
4.8 KiB
JSON
Raw Normal View History

{
"id": "CVE-2024-40918",
"sourceIdentifier": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
"published": "2024-07-12T13:15:14.863",
"lastModified": "2024-07-12T13:15:14.863",
"vulnStatus": "Received",
"cveTags": [],
"descriptions": [
{
"lang": "en",
"value": "In the Linux kernel, the following vulnerability has been resolved:\n\nparisc: Try to fix random segmentation faults in package builds\n\nPA-RISC systems with PA8800 and PA8900 processors have had problems\nwith random segmentation faults for many years. Systems with earlier\nprocessors are much more stable.\n\nSystems with PA8800 and PA8900 processors have a large L2 cache which\nneeds per page flushing for decent performance when a large range is\nflushed. The combined cache in these systems is also more sensitive to\nnon-equivalent aliases than the caches in earlier systems.\n\nThe majority of random segmentation faults that I have looked at\nappear to be memory corruption in memory allocated using mmap and\nmalloc.\n\nMy first attempt at fixing the random faults didn't work. On\nreviewing the cache code, I realized that there were two issues\nwhich the existing code didn't handle correctly. Both relate\nto cache move-in. Another issue is that the present bit in PTEs\nis racy.\n\n1) PA-RISC caches have a mind of their own and they can speculatively\nload data and instructions for a page as long as there is a entry in\nthe TLB for the page which allows move-in. TLBs are local to each\nCPU. Thus, the TLB entry for a page must be purged before flushing\nthe page. This is particularly important on SMP systems.\n\nIn some of the flush routines, the flush routine would be called\nand then the TLB entry would be purged. This was because the flush\nroutine needed the TLB entry to do the flush.\n\n2) My initial approach to trying the fix the random faults was to\ntry and use flush_cache_page_if_present for all flush operations.\nThis actually made things worse and led to a couple of hardware\nlockups. It finally dawned on me that some lines weren't being\nflushed because the pte check code was racy. This resulted in\nrandom inequivalent mappings to physical pages.\n\nThe __flush_cache_page tmpalias flush sets up its own TLB entry\nand it doesn't need the existing TLB entry. As long as we can find\nthe pte pointer for the vm page, we can get the pfn and physical\naddress of the page. We can also purge the TLB entry for the page\nbefore doing the flush. Further, __flush_cache_page uses a special\nTLB entry that inhibits cache move-in.\n\nWhen switching page mappings, we need to ensure that lines are\nremoved from the cache. It is not sufficient to just flush the\nlines to memory as they may come back.\n\nThis made it clear that we needed to implement all the required\nflush operations using tmpalias routines. This includes flushes\nfor user and kernel pages.\n\nAfter modifying the code to use tmpalias flushes, it became clear\nthat the random segmentation faults were not fully resolved. The\nfrequency of faults was worse on systems with a 64 MB L2 (PA8900)\nand systems with more CPUs (rp4440).\n\nThe warning that I added to flush_cache_page_if_present to detect\npages that couldn't be flushed triggered frequently on some systems.\n\nHelge and I looked at the pages that couldn't be flushed and found\nthat the PTE was either cleared or for a swap page. Ignoring pages\nthat were swapped out seemed okay but pages with cleared PTEs seemed\nproblematic.\n\nI looked at routines related to pte_clear and noticed ptep_clear_flush.\nThe default implementation just flushes the TLB entry. However, it was\nobvious that on parisc we need to flush the cache page as well. If\nwe don't flush the cache page, stale lines will be left in the cache\nand cause random corruption. Once a PTE is cleared, there is no way\nto find the physical address associated with the PTE and flush the\nassociated page at a later time.\n\nI implemented an updated change with a parisc specific version of\nptep_clear_flush. It fixed the random data corruption on Helge's rp4440\nand rp3440, as well as on my c8000.\n\nAt this point, I realized that I could restore the code where we only\nflush in flush_cache_page_if_present if the page has been accessed.\nHowever, for this, we also need to flush the cache when the accessed\nbit is cleared in\n---truncated
}
],
"metrics": {},
"references": [
{
"url": "https://git.kernel.org/stable/c/5bf196f1936bf93df31112fbdfb78c03537c07b0",
"source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67"
},
{
"url": "https://git.kernel.org/stable/c/72d95924ee35c8cd16ef52f912483ee938a34d49",
"source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67"
},
{
"url": "https://git.kernel.org/stable/c/d66f2607d89f760cdffed88b22f309c895a2af20",
"source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67"
}
]
}