feat: streaming support in m serve OpenAI API server#823
feat: streaming support in m serve OpenAI API server#823markstur wants to merge 18 commits intogenerative-computing:mainfrom
Conversation
Fixes: generative-computing#822 Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Added streaming support w/ setting system_fingerprint. Make it consistent. We are currently just setting it to None but now it is consistent for future use. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
planetf1
left a comment
There was a problem hiding this comment.
noticed a few things to tighten up.
Minor, but looking at the tests - The TestStreamingEndpoint tests (e.g. line 166) are marked @pytest.mark.asyncio and declared async def but contain no await — TestClient is synchronous and doesn't need the marker. TestStreamingHelpers is fine.
Suspect it will still work but it implies the wrong behaviour?
Doesn't belong in library. This is for cli. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
astream() returns deltas (only new fragments), not accumulated text. Update docstring. Fix unused previous_length in streaming.py. Rename vars for clarity. Fix streaming tests to be consistent with the non-accumulating behavoir. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Adds missing yield of the [DONE] that clients will expect. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Since stream defaults to False, a regression was introduced where stream=False is now passed to backends where it used to be default. Fix is to only forward stream=True and not add stream=False. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Stuff was marked asyncio and async def when it didn't need to be. Moved the usage tests a little for readability. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
In m serve, usage was included for backward compatibility but this is a new feature so that's not an issue. Instead the OpenAI spec is what we want to follow. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Use content=None instead of content="" to be more correct with OpenAI API. Remove unneeded check for "" in the test. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
fixed |
|
thanks for the review! Addressed all the comments. See replies if there was some question, but I think they are covered. |
psschwei
left a comment
There was a problem hiding this comment.
couple of things flagged by claude
Test-first for... Bug where we get into a loop without checking for this first. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Fix bug where we get into loop despite already having computed chunk. Update test which was written to test existing bug -- use wording that makes it current and not referencing the non-existent bug. Make the asserts more precise with expected chunks. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
…vior The default is now to only include usage when it is asked for when streaming. This is consistent with the OpenAI API spec. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Misc PR
Type of PR
Description
Add OpenAI API compatible support for streaming in
m serveapp.Testing