nvd-json-data-feeds/CVE-2024/CVE-2024-531xx/CVE-2024-53169.json

{
  "id": "CVE-2024-53169",
  "sourceIdentifier": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
  "published": "2024-12-27T14:15:24.057",
  "lastModified": "2024-12-27T14:15:24.057",
  "vulnStatus": "Awaiting Analysis",
  "cveTags": [],
  "descriptions": [
    {
      "lang": "en",
      "value": "In the Linux kernel, the following vulnerability has been resolved:\n\nnvme-fabrics: fix kernel crash while shutting down controller\n\nThe nvme keep-alive operation, which executes at a periodic interval,\ncould potentially sneak in while shutting down a fabric controller.\nThis may lead to a race between the fabric controller admin queue\ndestroy code path (invoked while shutting down controller) and hw/hctx\nqueue dispatcher called from the nvme keep-alive async request queuing\noperation. This race could lead to the kernel crash shown below:\n\nCall Trace:\n    autoremove_wake_function+0x0/0xbc (unreliable)\n    __blk_mq_sched_dispatch_requests+0x114/0x24c\n    blk_mq_sched_dispatch_requests+0x44/0x84\n    blk_mq_run_hw_queue+0x140/0x220\n    nvme_keep_alive_work+0xc8/0x19c [nvme_core]\n    process_one_work+0x200/0x4e0\n    worker_thread+0x340/0x504\n    kthread+0x138/0x140\n    start_kernel_thread+0x14/0x18\n\nWhile shutting down fabric controller, if nvme keep-alive request sneaks\nin then it would be flushed off. The nvme_keep_alive_end_io function is\nthen invoked to handle the end of the keep-alive operation which\ndecrements the admin->q_usage_counter and assuming this is the last/only\nrequest in the admin queue then the admin->q_usage_counter becomes zero.\nIf that happens then blk-mq destroy queue operation (blk_mq_destroy_\nqueue()) which could be potentially running simultaneously on another\ncpu (as this is the controller shutdown code path) would forward\nprogress and deletes the admin queue. So, now from this point onward\nwe are not supposed to access the admin queue resources. However the\nissue here's that the nvme keep-alive thread running hw/hctx queue\ndispatch operation hasn't yet finished its work and so it could still\npotentially access the admin queue resource while the admin queue had\nbeen already deleted and that causes the above crash.\n\nThe above kernel crash is regression caused due to changes implemented\nin commit a54a93d0e359 (\"nvme: move stopping keep-alive into\nnvme_uninit_ctrl()\"). Ideally we should stop keep-alive before destroyin\ng the admin queue and freeing the admin tagset so that it wouldn't sneak\nin during the shutdown operation. However we removed the keep alive stop\noperation from the beginning of the controller shutdown code path in commit\na54a93d0e359 (\"nvme: move stopping keep-alive into nvme_uninit_ctrl()\")\nand added it under nvme_uninit_ctrl() which executes very late in the\nshutdown code path after the admin queue is destroyed and its tagset is\nremoved. So this change created the possibility of keep-alive sneaking in\nand interfering with the shutdown operation and causing observed kernel\ncrash.\n\nTo fix the observed crash, we decided to move nvme_stop_keep_alive() from\nnvme_uninit_ctrl() to nvme_remove_admin_tag_set(). This change would ensure\nthat we don't forward progress and delete the admin queue until the keep-\nalive operation is finished (if it's in-flight) or cancelled and that would\nhelp contain the race condition explained above and hence avoid the crash.\n\nMoving nvme_stop_keep_alive() to nvme_remove_admin_tag_set() instead of\nadding nvme_stop_keep_alive() to the beginning of the controller shutdown\ncode path in nvme_stop_ctrl(), as was the case earlier before commit\na54a93d0e359 (\"nvme: move stopping keep-alive into nvme_uninit_ctrl()\"),\nwould help save one callsite of nvme_stop_keep_alive()."
    },
    {
      "lang": "es",
      "value": "En el kernel de Linux, se ha resuelto la siguiente vulnerabilidad: nvme-fabrics: se corrige el fallo del kernel al apagar el controlador La operaci\u00f3n de mantenimiento de conexi\u00f3n de nvme, que se ejecuta a intervalos peri\u00f3dicos, podr\u00eda colarse mientras se apaga un controlador de red. Esto puede provocar una ejecuci\u00f3n entre la ruta del c\u00f3digo de destrucci\u00f3n de la cola de administraci\u00f3n del controlador de red (invocada mientras se apaga el controlador) y el despachador de cola hw/hctx llamado desde la operaci\u00f3n de puesta en cola de solicitudes asincr\u00f3nicas de mantenimiento de conexi\u00f3n de nvme. Esta ejecuci\u00f3n podr\u00eda provocar el bloqueo del kernel que se muestra a continuaci\u00f3n: Rastreo de llamada: autoremove_wake_function+0x0/0xbc (no confiable) __blk_mq_sched_dispatch_requests+0x114/0x24c blk_mq_sched_dispatch_requests+0x44/0x84 blk_mq_run_hw_queue+0x140/0x220 nvme_keep_alive_work+0xc8/0x19c [nvme_core] process_one_work+0x200/0x4e0 worker_thread+0x340/0x504 kthread+0x138/0x140 start_kernel_thread+0x14/0x18 Al apagar el controlador de estructura, si la solicitud de mantenimiento de conexi\u00f3n de nvme se cuela, se eliminar\u00e1. Luego se invoca la funci\u00f3n nvme_keep_alive_end_io para gestionar el final de la operaci\u00f3n keep-alive que disminuye el admin-&gt;q_usage_counter y, asumiendo que esta es la \u00faltima/\u00fanica solicitud en la cola de administraci\u00f3n, entonces el admin-&gt;q_usage_counter se convierte en cero. Si eso sucede, entonces la operaci\u00f3n de destrucci\u00f3n de cola blk-mq (blk_mq_destroy_queue()) que podr\u00eda estar ejecut\u00e1ndose simult\u00e1neamente en otra CPU (ya que esta es la ruta del c\u00f3digo de apagado del controlador) reenviar\u00eda el progreso y eliminar\u00eda la cola de administraci\u00f3n. Entonces, ahora a partir de este punto en adelante no se supone que accedamos a los recursos de la cola de administraci\u00f3n. Sin embargo, el problema aqu\u00ed es que el hilo de keep-alive de nvme que ejecuta la operaci\u00f3n de despacho de cola hw/hctx a\u00fan no ha terminado su trabajo y, por lo tanto, a\u00fan podr\u00eda acceder potencialmente al recurso de la cola de administraci\u00f3n mientras que la cola de administraci\u00f3n ya se hab\u00eda eliminado y eso causa el bloqueo anterior. El fallo del kernel anterior es una regresi\u00f3n causada por los cambios implementados en el commit a54a93d0e359 (\"nvme: move stopping keep-alive into nvme_uninit_ctrl()\"). Lo ideal ser\u00eda detener el keep-alive antes de destruir la cola de administraci\u00f3n y liberar el conjunto de etiquetas de administraci\u00f3n para que no se introduzca durante la operaci\u00f3n de apagado. Sin embargo, eliminamos la operaci\u00f3n de detenci\u00f3n de keep-alive del comienzo de la ruta del c\u00f3digo de apagado del controlador en el commit a54a93d0e359 (\"nvme: move stopping keep-alive into nvme_uninit_ctrl()\") y la agregamos bajo nvme_uninit_ctrl() que se ejecuta muy tarde en la ruta del c\u00f3digo de apagado despu\u00e9s de que se destruye la cola de administraci\u00f3n y se elimina su conjunto de etiquetas. Por lo tanto, este cambio cre\u00f3 la posibilidad de que el keep-alive se introduzca e interfiera con la operaci\u00f3n de apagado y cause el fallo del kernel observado. Para solucionar el fallo observado, decidimos mover nvme_stop_keep_alive() de nvme_uninit_ctrl() a nvme_remove_admin_tag_set(). Este cambio garantizar\u00eda que no avancemos el progreso ni eliminemos la cola de administraci\u00f3n hasta que la operaci\u00f3n de mantenimiento de la actividad finalice (si est\u00e1 en curso) o se cancele, y eso ayudar\u00eda a contener la condici\u00f3n de ejecuci\u00f3n explicada anteriormente y, por lo tanto, evitar el fallo. Mover nvme_stop_keep_alive() a nvme_remove_admin_tag_set() en lugar de agregar nvme_stop_keep_alive() al comienzo de la ruta del c\u00f3digo de apagado del controlador en nvme_stop_ctrl(), como era el caso antes de el commit a54a93d0e359 (\"nvme: mover la operaci\u00f3n de mantenimiento de la actividad de detenci\u00f3n a nvme_uninit_ctrl()\"), ayudar\u00eda a ahorrar un sitio de llamada de nvme_stop_keep_alive()."
    }
  ],
  "metrics": {},
  "references": [
    {
      "url": "https://git.kernel.org/stable/c/30794f4952decb2ec8efa42f704cac5304499a41",
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67"
    },
    {
      "url": "https://git.kernel.org/stable/c/5416b76a8156c1b8491f78f8a728f422104bb919",
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67"
    },
    {
      "url": "https://git.kernel.org/stable/c/e9869c85c81168a1275f909d5972a3fc435304be",
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67"
    }
  ]
}