You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
!!! tip "Streaming responses with Server-Sent Events (SSE)"
816
816
817
-
This example waits for the full completion before responding. To stream tokens back to the client as they are generated, you can use Server-Sent Events (SSE) with the `python3-flask` template, which gives direct access to Flask's `stream_with_context` helper. See [Stream OpenAI responses from functions using Server Sent Events](https://www.openfaas.com/blog/openai-streaming-responses/) on the OpenFaaS blog for a working example.
817
+
This example waits for the full completion before responding. To stream tokens back to the client as they are generated, you can use Server-Sent Events (SSE) with the `python3-flask` template. See the next example for details, or refer to [Stream OpenAI responses from functions using Server Sent Events](https://www.openfaas.com/blog/openai-streaming-responses/) on the OpenFaaS blog.
818
+
819
+
## Example: Stream Server-Sent Events (SSE)
820
+
821
+
This example shows how to stream a response from a Python function using [Server-Sent Events (SSE)](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events). SSE allows a function to push data to a client as it becomes available, rather than waiting for the entire response to complete. This is useful for long-running tasks like LLM completions, progress updates, or real-time log tailing.
822
+
823
+
Streaming requires the `python3-flask` template, which gives direct access to Flask and lets the handler return a Flask `Response` object with a generator.
824
+
825
+
Clients must include an `Accept: text/event-stream` header in their request. This header tells the OpenFaaS gateway to stream the response through to the client as chunks arrive. Without it, the gateway will buffer the entire response before sending it back.
826
+
827
+
!!! info "About the python3-flask template"
828
+
829
+
The `python3-flask` template exposes a simpler handler interface than the `python3-http` template. The handler receives the raw request body as a string, and can return a string, a tuple of `(body, status_code)`, a tuple of `(body, status_code, headers)`, or a Flask `Response` object.
830
+
831
+
**1. Create the function**
832
+
833
+
Pull the `python3-flask` template and scaffold a new function:
834
+
835
+
```bash
836
+
faas-cli template store pull python3-flask
837
+
faas-cli new --lang python3-flask sse-example \
838
+
--prefix ttl.sh/openfaas-examples
839
+
```
840
+
841
+
The example uses the public [ttl.sh](https://ttl.sh) registry — replace the prefix with your own registry for production use.
842
+
843
+
**2. Write the handler**
844
+
845
+
The handler uses a Python generator to yield events in the [SSE format](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#event_stream_format). Each event is prefixed with `data: ` and terminated by two newlines. The generator is wrapped in a Flask `Response` with the `text/event-stream` content type.
Build, push and deploy the function with `faas-cli up`. The `--filter` flag selects a single function from the stack file and `--tag digest` uses the image content hash as the tag instead of `latest`, so that Kubernetes always pulls an updated image:
The `Accept: text/event-stream` header tells the OpenFaaS gateway to stream the response to the client as chunks arrive, rather than buffering the entire response.
876
+
877
+
You should see each message appear one second apart:
878
+
879
+
```
880
+
data: Message 1 of 5
881
+
882
+
data: Message 2 of 5
883
+
884
+
data: Message 3 of 5
885
+
886
+
data: Message 4 of 5
887
+
888
+
data: Message 5 of 5
889
+
890
+
data: [DONE]
891
+
```
892
+
893
+
!!! note "Timeouts"
894
+
895
+
Streaming responses can run for longer than the default function timeout. Make sure your OpenFaaS [timeout values](/tutorials/expanded-timeouts/) are configured appropriately for your streaming workloads.
0 commit comments