-
Notifications
You must be signed in to change notification settings - Fork 216
Enable prompt lookup decoding #3234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/llm/language_model/continuous_batching/servable_initializer.cpp
Outdated
Show resolved
Hide resolved
src/llm/language_model/continuous_batching/servable_initializer.cpp
Outdated
Show resolved
Hide resolved
src/llm/language_model/continuous_batching/servable_initializer.cpp
Outdated
Show resolved
Hide resolved
static void SetUpTestSuite() { | ||
std::string port = "9173"; | ||
ovms::Server& server = ovms::Server::instance(); | ||
::SetUpServer(t, server, port, getGenericFullPathForSrcTest("/ovms/src/test/llm/assisted_decoding_config.json").c_str(), 60); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really wait 60 seconds to load the model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it unit test anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part was copied from llm node test fixture. I suppose for this one we could reduce it to, say 15, since there are no VLM models involved
} | ||
} | ||
|
||
int generateExpectedText(std::string prompt, bool addSpecialTokens) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is repeated, i have already seen this code in other _test.cpp, probably @michalkulakowski has already made this mechanism of verification before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, I'll see if I can reuse more parts of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has to be repeated. This method is closely tied to the class and uses many of its members. Extracting it would make it's usage less comfortable and readable. Inheritance of the whole class is not viable due to the usage of static members and methods. So reusing this code would require additional effort that was not planned to be in the scope of this PR.
|
||
// Dynamic number of candidates | ||
/* | ||
requestBody = R"( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it in comment, but not another test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debugging leftover, I'll uncomment it
note that this test is skipped anyway for now
ASSERT_EQ(choice["message"]["content"].GetString(), expectedMessages[0]); | ||
} | ||
|
||
// Consider parametrization of negative tests with request body and endpoint as parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there jira for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I was thinking about doing that in scope of bigger task for general refactor of LLM tests
Do not merge until fix is merged to GenAI |
Agreed to merge with limited testing for assisted generation |
🛠 Summary
prompt_lookup
property (if set totrue
, servable will create prompt lookup decoding pipelinemax_ngram_size
parameter inchat/completions
andcompletions
APIs to support this type of pipeline🧪 Checklist
``