bump version to v0.12.1 (#4350)

lvhan028 · web-flow · commit e5df4e8336cd · 2026-02-13T17:01:39.000+08:00
* bump version to v0.12.1

* install latest nccl

* remove cu11 docker image release
diff --git a/.github/workflows/docker.yml b/.github/workflows/docker.yml
@@ -38,7 +38,7 @@ jobs:
     strategy:
       fail-fast: false
       matrix:
-        cuda_version: ['cu12.8', cu12, cu11]
+        cuda_version: ['cu12.8', 'cu12']
     env:
       CUDA_VERSION: ${{ matrix.cuda_version }}
       TAG_PREFIX: "openmmlab/lmdeploy"
diff --git a/README.md b/README.md
@@ -225,7 +225,7 @@ The default prebuilt package is compiled on **CUDA 12** since v0.3.0.
 For the GeForce RTX 50 series, please install the LMDeploy prebuilt package complied with **CUDA 12.8**
 
 ```shell
-export LMDEPLOY_VERSION=0.12.0
+export LMDEPLOY_VERSION=0.12.1
 export PYTHON_VERSION=310
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu128-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu128
 ```
diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -227,7 +227,7 @@ pip install lmdeploy
 若使用 GeForce RTX 50 系列显卡，请安装基于 **CUDA 12.8** 编译的 LMDeploy 预编译包。
 
 ```shell
-export LMDEPLOY_VERSION=0.12.0
+export LMDEPLOY_VERSION=0.12.1
 export PYTHON_VERSION=310
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu128-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu128
 ```
diff --git a/docker/install.sh b/docker/install.sh
@@ -110,4 +110,9 @@ pip install -r /tmp/requirements/serve.txt
 if [[ "${CUDA_VERSION_SHORT}" = "cu118" ]]; then
     rm -rf /opt/py3/lib/python${PYTHON_VERSION}/site-packages/nvidia/nccl
     cp -R /nccl /opt/py3/lib/python${PYTHON_VERSION}/site-packages/nvidia/
+elif [[ "${CUDA_VERSION_SHORT}" = "cu128" ]]; then
+    # As described in https://github.com/InternLM/lmdeploy/pull/4313,
+    # window registration may cause memory leaks in NCCL 2.27, NCCL 2.28+ resolves the issue,
+    # but turbomind engine will use nccl GIN for EP in future, which is brought in since 2.29
+    pip install "nvidia-nccl-cu12>2.29"
 fi
diff --git a/docs/en/get_started/installation.md b/docs/en/get_started/installation.md
@@ -23,7 +23,7 @@ pip install lmdeploy
 The default prebuilt package is compiled on **CUDA 12**. If CUDA 11+ (>=11.3) is required, you can install lmdeploy by:
 
 ```shell
-export LMDEPLOY_VERSION=0.12.0
+export LMDEPLOY_VERSION=0.12.1
 export PYTHON_VERSION=310
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
 ```
diff --git a/docs/zh_cn/get_started/installation.md b/docs/zh_cn/get_started/installation.md
@@ -23,7 +23,7 @@ pip install lmdeploy
 默认的预构建包是在 **CUDA 12** 上编译的。如果需要 CUDA 11+ (>=11.3)，你可以使用以下命令安装 lmdeploy：
 
 ```shell
-export LMDEPLOY_VERSION=0.12.0
+export LMDEPLOY_VERSION=0.12.1
 export PYTHON_VERSION=310
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
 ```
diff --git a/lmdeploy/version.py b/lmdeploy/version.py
@@ -1,7 +1,7 @@
 # Copyright (c) OpenMMLab. All rights reserved.
 from typing import Tuple
 
-__version__ = '0.12.0'
+__version__ = '0.12.1'
 short_version = __version__