Skip to content

Commit 10c6bde

Browse files
committed
update README
1 parent 2d7d0ee commit 10c6bde

1 file changed

Lines changed: 1 addition & 63 deletions

File tree

llm/README.md

Lines changed: 1 addition & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -32,69 +32,7 @@
3232
Note:
3333
1. 请保证 shm-size >= 5,不然可能会导致服务启动失败
3434

35-
更多关于 FastDeploy 的使用方法,请查看[服务化部署流程](https://console.cloud.baidu-int.com/devops/icode/repos/baidu/fastdeploy/serving/blob/opensource/docs/FastDeploy_usage_tutorial.md)
36-
37-
# benchmark 测试
38-
39-
我们在 `Llama-3-8B-Instruct` 模型不同的精度下,对 FastDeploy 的性能进行测试,测试结果如下表所示:
40-
41-
<table align="center" border="1" style="text-align: center; vertical-align: middle;">
42-
<tr>
43-
<th align="center">框架</th>
44-
<th align="center">精度</th>
45-
<th align="center">QPS</th>
46-
<th align="center">tokens/s</th>
47-
<th align="center">整句时延</th>
48-
</tr>
49-
<tr>
50-
<td rowspan="3">FastDeploy</td>
51-
<td>FP16/BF16</td>
52-
<td>16.21</td>
53-
<td>3171.09</td>
54-
<td>7.15</td>
55-
</tr>
56-
<tr>
57-
<td>WINT8</td>
58-
<td>14.84</td>
59-
<td>2906.27</td>
60-
<td>7.95</td>
61-
</tr>
62-
<tr>
63-
<td>W8A8C8-INT8</td>
64-
<td>20.60</td>
65-
<td>4031.75</td>
66-
<td>5.61</td>
67-
</tr>
68-
<tr>
69-
<td rowspan="3">vLLM</td>
70-
<td>FP16/BF16</td>
71-
<td>9.07</td>
72-
<td>1766.11</td>
73-
<td>13.32</td>
74-
</tr>
75-
<tr>
76-
<td>WINT8</td>
77-
<td>8.23</td>
78-
<td>1602.96</td>
79-
<td>14.85</td>
80-
</tr>
81-
<tr>
82-
<td>W8A8C8-INT8</td>
83-
<td>9.41</td>
84-
<td>1831.81</td>
85-
<td>12.76</td>
86-
</tr>
87-
</table>
88-
89-
- 测试环境:
90-
- GPU:NVIDIA A100-SXM4-80GB
91-
- cuda 版本:11.6
92-
- cudnn 版本:8.4.0
93-
- Batch Size: 128
94-
- 请求并发量:128
95-
- vLLM 版本:v0.5.3
96-
- TRT-LLM 版本:v0.11.0
97-
- 数据集:[ShareGPT_V3_unfiltered_cleaned_split.json](https://huggingface.co/datasets/learnanything/sharegpt_v3_unfiltered_cleaned_split/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json)
35+
更多关于 FastDeploy 的使用方法,请查看[服务化部署流程](https://github.com/PaddlePaddle/FastDeploy/blob/develop/llm/docs/FastDeploy_usage_tutorial.md)
9836

9937
# License
10038

0 commit comments

Comments
 (0)