Skip to content

Commit 9a50f1f

Browse files
authored
bump version to v0.12.2 (#4378)
* bump version to v0.12.2 * rename * fix typo * fix * fix typo in llm_compressor.md * use logger.exception * logger.debug num_outputs * warning to info * remove role checker * update doc
1 parent a30b976 commit 9a50f1f

16 files changed

Lines changed: 19 additions & 17 deletions

File tree

File renamed without changes.

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -176,7 +176,7 @@ LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by
176176
<li>Qwen2-VL (2B, 7B, 72B)</li>
177177
<li>Qwen2.5-VL (3B, 7B, 72B)</li>
178178
<li>Qwen3-VL (2B - 235B)</li>
179-
<li>Qwen3.5</li>
179+
<li>Qwen3.5 (0.8B - 397B)</li>
180180
<li>DeepSeek-VL (7B)</li>
181181
<li>DeepSeek-VL2 (3B, 16B, 27B)</li>
182182
<li>InternVL-Chat (v1.1-v1.5)</li>
@@ -228,7 +228,7 @@ The default prebuilt package is compiled on **CUDA 12** since v0.3.0.
228228
For the GeForce RTX 50 series, please install the LMDeploy prebuilt package complied with **CUDA 12.8**
229229

230230
```shell
231-
export LMDEPLOY_VERSION=0.12.1
231+
export LMDEPLOY_VERSION=0.12.2
232232
export PYTHON_VERSION=310
233233
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu128-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu128
234234
```

README_ja.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,7 @@ LMDeploy TurboMindエンジンは卓越した推論能力を持ち、さまざ
155155
<li>Qwen2-VL (2B, 7B, 72B)</li>
156156
<li>Qwen2.5-VL (3B, 7B, 72B)</li>
157157
<li>Qwen3-VL (2B - 235B)</li>
158-
<li>Qwen3.5</li>
158+
<li>Qwen3.5 (0.8B - 397B)</li>
159159
<li>DeepSeek-VL (7B)</li>
160160
<li>DeepSeek-VL2 (3B, 16B, 27B)</li>
161161
<li>InternVL-Chat (v1.1-v1.5)</li>

README_zh-CN.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,7 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
178178
<li>Qwen2-VL (2B, 7B, 72B)</li>
179179
<li>Qwen2.5-VL (3B, 7B, 72B)</li>
180180
<li>Qwen3-VL (2B - 235B)</li>
181-
<li>Qwen3.5</li>
181+
<li>Qwen3.5 (0.8B - 397B)</li>
182182
<li>DeepSeek-VL (7B)</li>
183183
<li>DeepSeek-VL2 (3B, 16B, 27B)</li>
184184
<li>InternVL-Chat (v1.1-v1.5)</li>
@@ -230,7 +230,7 @@ pip install lmdeploy
230230
若使用 GeForce RTX 50 系列显卡,请安装基于 **CUDA 12.8** 编译的 LMDeploy 预编译包。
231231

232232
```shell
233-
export LMDEPLOY_VERSION=0.12.1
233+
export LMDEPLOY_VERSION=0.12.2
234234
export PYTHON_VERSION=310
235235
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu128-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu128
236236
```

docs/en/get_started/installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ pip install lmdeploy
2323
The default prebuilt package is compiled on **CUDA 12**. If CUDA 11+ (>=11.3) is required, you can install lmdeploy by:
2424

2525
```shell
26-
export LMDEPLOY_VERSION=0.12.1
26+
export LMDEPLOY_VERSION=0.12.2
2727
export PYTHON_VERSION=310
2828
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
2929
```

docs/en/quantization/llm_compressor.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ conda create -n lmdeploy python=3.10 -y
4444
conda activate lmdeploy
4545

4646
# Install llm-compressor
47-
pip install llm-compressor
47+
pip install llmcompressor
4848

4949
# Clone lmdeploy source code and run the quantization example
5050
git clone https://github.com/InternLM/lmdeploy

docs/en/supported_models/supported_models.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
2525
| Qwen2-MoE | 57BA14B | LLM | Yes | Yes | Yes | Yes |
2626
| Qwen2.5<sup>\[2\]</sup> | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes |
2727
| Qwen3 | 0.6B-235B | LLM | Yes | Yes | Yes\* | Yes\* |
28+
| Qwen3.5<sup>\[3\]</sup> | 0.8B-397B | MLLM | Yes | Yes | No | Yes |
2829
| Mistral<sup>\[1\]</sup> | 7B | LLM | Yes | Yes | Yes | No |
2930
| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | Yes |
3031
| DeepSeek-V2 | 16B, 236B | LLM | Yes | Yes | Yes | No |
@@ -54,6 +55,7 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
5455
```{note}
5556
* [1] The TurboMind engine doesn't support window attention. Therefore, for models that have applied window attention and have the corresponding switch "use_sliding_window" enabled, such as Mistral, Qwen1.5 and etc., please choose the PyTorch engine for inference.
5657
* [2] When the head_dim of a model is not 128, such as llama3.2-1B, qwen2-0.5B and internvl2-1B, turbomind doesn't support its kv cache 4/8 bit quantization and inference
58+
* [3] TurboMind does not currently support the vision encoder for the Qwen3.5 series.
5759
```
5860

5961
## PyTorchEngine on CUDA Platform
@@ -89,7 +91,7 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
8991
| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | No | No | Yes |
9092
| QWen2.5-VL | 3B - 72B | MLLM | Yes | No | No | No | No |
9193
| QWen3-VL | 2B - 235B | MLLM | Yes | No | No | No | No |
92-
| QWen3.5 | 27B-397B | MLLM | Yes | No | No | No | No |
94+
| QWen3.5 | 0.8B-397B | MLLM | Yes | No | No | No | No |
9395
| DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No |
9496
| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No |
9597
| DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No |

docs/zh_cn/get_started/installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ pip install lmdeploy
2323
默认的预构建包是在 **CUDA 12** 上编译的。如果需要 CUDA 11+ (>=11.3),你可以使用以下命令安装 lmdeploy:
2424

2525
```shell
26-
export LMDEPLOY_VERSION=0.12.1
26+
export LMDEPLOY_VERSION=0.12.2
2727
export PYTHON_VERSION=310
2828
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
2929
```

0 commit comments

Comments
 (0)