File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change 3232Note:
33331 . 请保证 shm-size >= 5,不然可能会导致服务启动失败
3434
35- 更多关于 FastDeploy 的使用方法,请查看[ 服务化部署流程] ( https://console.cloud.baidu-int.com/devops/icode/repos/baidu/fastdeploy/serving/blob/opensource/docs/FastDeploy_usage_tutorial.md )
36-
37- # benchmark 测试
38-
39- 我们在 ` Llama-3-8B-Instruct ` 模型不同的精度下,对 FastDeploy 的性能进行测试,测试结果如下表所示:
40-
41- <table align =" center " border =" 1 " style =" text-align : center ; vertical-align : middle ;" >
42- <tr>
43- <th align="center">框架</th>
44- <th align="center">精度</th>
45- <th align="center">QPS</th>
46- <th align="center">tokens/s</th>
47- <th align="center">整句时延</th>
48- </tr>
49- <tr>
50- <td rowspan="3">FastDeploy</td>
51- <td>FP16/BF16</td>
52- <td>16.21</td>
53- <td>3171.09</td>
54- <td>7.15</td>
55- </tr>
56- <tr>
57- <td>WINT8</td>
58- <td>14.84</td>
59- <td>2906.27</td>
60- <td>7.95</td>
61- </tr>
62- <tr>
63- <td>W8A8C8-INT8</td>
64- <td>20.60</td>
65- <td>4031.75</td>
66- <td>5.61</td>
67- </tr>
68- <tr>
69- <td rowspan="3">vLLM</td>
70- <td>FP16/BF16</td>
71- <td>9.07</td>
72- <td>1766.11</td>
73- <td>13.32</td>
74- </tr>
75- <tr>
76- <td>WINT8</td>
77- <td>8.23</td>
78- <td>1602.96</td>
79- <td>14.85</td>
80- </tr>
81- <tr>
82- <td>W8A8C8-INT8</td>
83- <td>9.41</td>
84- <td>1831.81</td>
85- <td>12.76</td>
86- </tr>
87- </table >
88-
89- - 测试环境:
90- - GPU:NVIDIA A100-SXM4-80GB
91- - cuda 版本:11.6
92- - cudnn 版本:8.4.0
93- - Batch Size: 128
94- - 请求并发量:128
95- - vLLM 版本:v0.5.3
96- - TRT-LLM 版本:v0.11.0
97- - 数据集:[ ShareGPT_V3_unfiltered_cleaned_split.json] ( https://huggingface.co/datasets/learnanything/sharegpt_v3_unfiltered_cleaned_split/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json )
35+ 更多关于 FastDeploy 的使用方法,请查看[ 服务化部署流程] ( https://github.com/PaddlePaddle/FastDeploy/blob/develop/llm/docs/FastDeploy_usage_tutorial.md )
9836
9937# License
10038
You can’t perform that action at this time.
0 commit comments