Skip to content

Enable prompt lookup decoding #3234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Apr 26, 2025
Merged

Enable prompt lookup decoding #3234

merged 13 commits into from
Apr 26, 2025

Conversation

mzegla
Copy link
Collaborator

@mzegla mzegla commented Apr 11, 2025

🛠 Summary

  • adding support for prompt_lookup property (if set to true, servable will create prompt lookup decoding pipeline
  • adding max_ngram_size parameter in chat/completions and completions APIs to support this type of pipeline
  • adding tests and moving all assisted decoding tests to a separate file

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

@mzegla mzegla marked this pull request as ready for review April 14, 2025 15:13
static void SetUpTestSuite() {
std::string port = "9173";
ovms::Server& server = ovms::Server::instance();
::SetUpServer(t, server, port, getGenericFullPathForSrcTest("/ovms/src/test/llm/assisted_decoding_config.json").c_str(), 60);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really wait 60 seconds to load the model?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it unit test anymore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part was copied from llm node test fixture. I suppose for this one we could reduce it to, say 15, since there are no VLM models involved

}
}

int generateExpectedText(std::string prompt, bool addSpecialTokens) {
Copy link
Collaborator

@dkalinowski dkalinowski Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is repeated, i have already seen this code in other _test.cpp, probably @michalkulakowski has already made this mechanism of verification before

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, I'll see if I can reuse more parts of the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to be repeated. This method is closely tied to the class and uses many of its members. Extracting it would make it's usage less comfortable and readable. Inheritance of the whole class is not viable due to the usage of static members and methods. So reusing this code would require additional effort that was not planned to be in the scope of this PR.


// Dynamic number of candidates
/*
requestBody = R"(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it in comment, but not another test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debugging leftover, I'll uncomment it
note that this test is skipped anyway for now

ASSERT_EQ(choice["message"]["content"].GetString(), expectedMessages[0]);
}

// Consider parametrization of negative tests with request body and endpoint as parameters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there jira for that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I was thinking about doing that in scope of bigger task for general refactor of LLM tests

@mzegla
Copy link
Collaborator Author

mzegla commented Apr 17, 2025

Do not merge until fix is merged to GenAI

@mzegla
Copy link
Collaborator Author

mzegla commented Apr 25, 2025

Agreed to merge with limited testing for assisted generation
@dtrawins

@dtrawins dtrawins merged commit 04e4909 into main Apr 26, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants