"value":"In the Linux kernel, the following vulnerability has been resolved:\n\ndevice-dax: correct pgoff align in dax_set_mapping()\n\npgoff should be aligned using ALIGN_DOWN() instead of ALIGN(). Otherwise,\nvmf->address not aligned to fault_size will be aligned to the next\nalignment, that can result in memory failure getting the wrong address.\n\nIt's a subtle situation that only can be observed in\npage_mapped_in_vma() after the page is page fault handled by\ndev_dax_huge_fault. Generally, there is little chance to perform\npage_mapped_in_vma in dev-dax's page unless in specific error injection\nto the dax device to trigger an MCE - memory-failure. In that case,\npage_mapped_in_vma() will be triggered to determine which task is\naccessing the failure address and kill that task in the end.\n\n\nWe used self-developed dax device (which is 2M aligned mapping) , to\nperform error injection to random address. It turned out that error\ninjected to non-2M-aligned address was causing endless MCE until panic.\nBecause page_mapped_in_vma() kept resulting wrong address and the task\naccessing the failure address was never killed properly:\n\n\n[ 3783.719419] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3784.049006] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3784.049190] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3784.448042] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3784.448186] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3784.792026] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3784.792179] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3785.162502] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3785.162633] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3785.461116] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3785.461247] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3785.764730] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3785.764859] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3786.042128] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3786.042259] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3786.464293] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3786.464423] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3786.818090] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3786.818217] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3787.085297] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3787.085424] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n\nIt took us several weeks to pinpoint this problem,\u00a0 but we eventually\nused bpftrace to trace the page fault and mce address and successfully\nidentified the issue.\n\n\nJoao added:\n\n; Likely we never reproduce in production because we always pin\n: device-dax regions in the region align they provide (Qemu does\n: similarly with prealloc in hugetlb/file backed memory). I think this\n: bug requires that we touch *unpinned* device-dax regions unaligned to\n: the device-dax selected alignment (page size i.e. 4K/2M/1G)"