Skip to content

Commit 929b158

Browse files
Adjust docker swarm documentation
1 parent bd08126 commit 929b158

2 files changed

Lines changed: 127 additions & 144 deletions

File tree

README-docker_swarm.md

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# Tensorflow CPU Inference API For Windows and Linux with docker swarm
2+
Please use **docker swarm** only if you need to:
3+
4+
* Provide redundancy in terms of API containers: In case a container went down, the incoming requests will be redirected to another running instance.
5+
6+
* Coordinate between the containers: Swarm will orchestrate between the APIs and choose one of them to listen to the incoming request.
7+
8+
* Scale up the Inference service in order to get a faster prediction especially if there's traffic on the service.
9+
10+
## Run the docker container
11+
12+
Docker swarm can scale up the API into multiple replicas and can be used on one or multiple hosts(Linux users only). In both cases, a docker swarm setup is required for all hosts.
13+
14+
#### Docker swarm setup
15+
16+
1- Initialize Swarm:
17+
18+
```sh
19+
docker swarm init
20+
```
21+
22+
2- On the manager host, open the cpu-inference.yaml file and specify the number of replicas needed. In case you are using multiple hosts (With multiple hosts section), the number of replicas will be divided across all hosts.
23+
24+
```yaml
25+
version: "3"
26+
27+
services:
28+
api:
29+
ports:
30+
- "4343:4343"
31+
image: tensorflow_inference_api_cpu
32+
volumes:
33+
- "/mnt/models:/models"
34+
deploy:
35+
replicas: 1
36+
update_config:
37+
parallelism: 2
38+
delay: 10s
39+
restart_policy:
40+
condition: on-failure
41+
```
42+
43+
**Notes about cpu-inference.yaml:**
44+
45+
* the volumes field on the left of ":" should be an absolute path, can be changeable by the user, and represents the models directory on your Operating System
46+
* the following volume's field ":/models" should never be changed
47+
48+
#### With one host
49+
50+
Deploy the API:
51+
52+
```sh
53+
docker stack deploy -c cpu-inference.yaml tensorflow-cpu
54+
```
55+
56+
![onehost](./docs/tcpu.png)
57+
58+
#### With multiple hosts (Linux users only)
59+
60+
1- **Make sure hosts are reachable on the same network**.
61+
62+
2- Choose a host to be the manager and run the following command on the chosen host to generate a token so the other hosts can join:
63+
64+
```sh
65+
docker swarm join-token worker
66+
```
67+
68+
A command will appear on your terminal, copy and paste it on the other hosts, as seen in the below image
69+
70+
3- Deploy your application using:
71+
72+
```sh
73+
docker stack deploy -c cpu-inference.yaml tensorflow-cpu
74+
```
75+
76+
![multhost](./docs/tcpu2.png)
77+
78+
#### Useful Commands
79+
80+
1- In order to scale up the service to 4 replicas for example use this command:
81+
82+
```sh
83+
docker service scale tensorflow-cpu_api=4
84+
```
85+
86+
2- To check the available workers:
87+
88+
```sh
89+
docker node ls
90+
```
91+
92+
3- To check on which node the container is running:
93+
94+
```sh
95+
docker service ps tensorflow-cpu_api
96+
```
97+
98+
4- To check the number of replicas:
99+
100+
```sh
101+
docker service ls
102+
```
103+
104+
## Benchmarking
105+
106+
Here are two graphs showing time of prediction for different number of requests at the same time.
107+
108+
109+
![CPU 20 req](./docs/TCPU20req.png)
110+
111+
112+
![CPU 40 req](./docs/TCPU40req.png)
113+
114+
115+
We can see that both graphs got the same result no matter what is the number of received requests at the same time. When we increase the number of workers (hosts) we are able to speed up the inference by at least 2 times. For example we can see in the last column we were able to process 40 requests in:
116+
117+
- 17.5 seconds with 20 replicas in 1 machine
118+
- 8.8 seconds with 20 replicas in each of the 2 machines
119+
120+
Moreover, in case one of the machines is down the others are always ready to receive requests.
121+
122+
Finally since we are predicting on CPU scaling more replicas doesn't mean a faster prediction, 4 containers was faster than 20.

README.md

Lines changed: 5 additions & 144 deletions
Original file line numberDiff line numberDiff line change
@@ -21,28 +21,6 @@ If none of the aforementioned requirements are needed, simply use **docker**.
2121

2222
![predict image](./docs/4.gif)
2323

24-
## Contents
25-
26-
```sh
27-
Tensorflow CPU Inference API For Windows and Linux/
28-
├── Prerequisites
29-
│ ├── Check for prerequisites
30-
│ └── Install prerequisites
31-
├── Build The Docker Image
32-
├── Run the docker container
33-
│ ├── Docker
34-
│ └── Docker swarm
35-
│ ├── Docker swarm setup
36-
│ ├── With one host
37-
│ ├── With multiple hosts
38-
│ └── Useful Commands
39-
├── API Endpoints
40-
├── Model structure
41-
└── Benchmarking
42-
├── Docker
43-
└── Docker swarm
44-
```
45-
4624
## Prerequisites
4725

4826
- OS:
@@ -90,7 +68,11 @@ sudo docker build --build-arg http_proxy='' --build-arg https_proxy='' -t tensor
9068

9169
## Run the docker container
9270

93-
### Docker
71+
As mentioned before, this container can be deployed using either **docker** or **docker swarm**.
72+
73+
If you wish to deploy this API using **docker**, please issue the following run command.
74+
75+
If you wish to deploy this API using **docker swarm**, please refer to following link [docker swarm documentation](https://github.com/BMW-InnovationLab/BMW-TensorFlow-Inference-API-GPU/blob/dev-swarm/README-docker_swarm.md). After deploying the API with docker swarm, please consider returning to this documentation for further information about the API endpoints as well as the model structure sections.
9476

9577
To run the API, go the to the API's directory and run the following:
9678

@@ -110,104 +92,6 @@ The <docker_host_port> can be any unique port of your choice.
11092

11193
The API file will be run automatically, and the service will listen to http requests on the chosen port.
11294

113-
114-
115-
In case you are deploying your API without **docker swarm**, please skip the next section and directly proceed to *API endpoints section*.
116-
117-
### Docker swarm
118-
119-
Docker swarm can scale up the API into multiple replicas and can be used on one or multiple hosts(Linux users only). In both cases, a docker swarm setup is required for all hosts.
120-
121-
#### Docker swarm setup
122-
123-
1- Initialize Swarm:
124-
125-
```sh
126-
docker swarm init
127-
```
128-
129-
2- On the manager host, open the cpu-inference.yaml file and specify the number of replicas needed. In case you are using multiple hosts (With multiple hosts section), the number of replicas will be divided across all hosts.
130-
131-
```yaml
132-
version: "3"
133-
134-
services:
135-
api:
136-
ports:
137-
- "4343:4343"
138-
image: tensorflow_inference_api_cpu
139-
volumes:
140-
- "/mnt/models:/models"
141-
deploy:
142-
replicas: 1
143-
update_config:
144-
parallelism: 2
145-
delay: 10s
146-
restart_policy:
147-
condition: on-failure
148-
```
149-
150-
**Notes about cpu-inference.yaml:**
151-
152-
* the volumes field on the left of ":" should be an absolute path, can be changeable by the user, and represents the models directory on your Operating System
153-
* the following volume's field ":/models" should never be changed
154-
155-
#### With one host
156-
157-
Deploy the API:
158-
159-
```sh
160-
docker stack deploy -c cpu-inference.yaml tensorflow-cpu
161-
```
162-
163-
![onehost](./docs/tcpu.png)
164-
165-
#### With multiple hosts (Linux users only)
166-
167-
1- **Make sure hosts are reachable on the same network**.
168-
169-
2- Choose a host to be the manager and run the following command on the chosen host to generate a token so the other hosts can join:
170-
171-
```sh
172-
docker swarm join-token worker
173-
```
174-
175-
A command will appear on your terminal, copy and paste it on the other hosts, as seen in the below image
176-
177-
3- Deploy your application using:
178-
179-
```sh
180-
docker stack deploy -c cpu-inference.yaml tensorflow-cpu
181-
```
182-
183-
![multhost](./docs/tcpu2.png)
184-
185-
#### Useful Commands
186-
187-
1- In order to scale up the service to 4 replicas for example use this command:
188-
189-
```sh
190-
docker service scale tensorflow-cpu_api=4
191-
```
192-
193-
2- To check the available workers:
194-
195-
```sh
196-
docker node ls
197-
```
198-
199-
3- To check on which node the container is running:
200-
201-
```sh
202-
docker service ps tensorflow-cpu_api
203-
```
204-
205-
4- To check the number of replicas:
206-
207-
```sh
208-
docker service ls
209-
```
210-
21195
## API Endpoints
21296

21397
To see all available endpoints, open your favorite browser and navigate to:
@@ -301,8 +185,6 @@ Inside each subfolder there should be a:
301185

302186
## Benchmarking
303187

304-
### Docker
305-
306188
<table>
307189
<thead align="center">
308190
<tr>
@@ -365,27 +247,6 @@ Inside each subfolder there should be a:
365247
</tr>
366248
</tbody>
367249
</table>
368-
369-
### Docker swarm
370-
371-
Here are two graphs showing time of prediction for different number of requests at the same time.
372-
373-
374-
![CPU 20 req](./docs/TCPU20req.png)
375-
376-
377-
![CPU 40 req](./docs/TCPU40req.png)
378-
379-
380-
We can see that both graphs got the same result no matter what is the number of received requests at the same time. When we increase the number of workers (hosts) we are able to speed up the inference by at least 2 times. For example we can see in the last column we were able to process 40 requests in:
381-
382-
- 17.5 seconds with 20 replicas in 1 machine
383-
- 8.8 seconds with 20 replicas in each of the 2 machines
384-
385-
Moreover, in case one of the machines is down the others are always ready to receive requests.
386-
387-
Finally since we are predicting on CPU scaling more replicas doesn't mean a faster prediction, 4 containers was faster than 20.
388-
389250
## Acknowledgment
390251

391252
[inmind.ai](https://inmind.ai)

0 commit comments

Comments
 (0)