Enable prompt lookup decoding #3234

mzegla · 2025-04-11T14:45:41Z

🛠 Summary

adding support for prompt_lookup property (if set to true, servable will create prompt lookup decoding pipeline
adding max_ngram_size parameter in chat/completions and completions APIs to support this type of pipeline
adding tests and moving all assisted decoding tests to a separate file

🧪 Checklist

Unit tests added.
The documentation updated.
Change follows security best practices.
``

src/llm/apis/openai_completions.cpp

src/test/llm/lm_cb_prompt_lookup.pbtxt

src/llm/language_model/continuous_batching/servable_initializer.cpp

src/llm/apis/openai_completions.cpp

src/llm/apis/openai_completions.hpp

dkalinowski · 2025-04-16T10:39:02Z

src/test/llm/assisted_decoding_test.cpp

+    static void SetUpTestSuite() {
+        std::string port = "9173";
+        ovms::Server& server = ovms::Server::instance();
+        ::SetUpServer(t, server, port, getGenericFullPathForSrcTest("/ovms/src/test/llm/assisted_decoding_config.json").c_str(), 60);


do we really wait 60 seconds to load the model?

is it unit test anymore

This part was copied from llm node test fixture. I suppose for this one we could reduce it to, say 15, since there are no VLM models involved

dkalinowski · 2025-04-16T10:42:22Z

src/test/llm/assisted_decoding_test.cpp

+        }
+    }
+
+    int generateExpectedText(std::string prompt, bool addSpecialTokens) {


this is repeated, i have already seen this code in other _test.cpp, probably @michalkulakowski has already made this mechanism of verification before

That's true, I'll see if I can reuse more parts of the code.

It has to be repeated. This method is closely tied to the class and uses many of its members. Extracting it would make it's usage less comfortable and readable. Inheritance of the whole class is not viable due to the usage of static members and methods. So reusing this code would require additional effort that was not planned to be in the scope of this PR.

dkalinowski · 2025-04-16T10:44:01Z

src/test/llm/assisted_decoding_test.cpp

+
+    // Dynamic number of candidates
+    /*
+    requestBody = R"(


why is it in comment, but not another test?

debugging leftover, I'll uncomment it
note that this test is skipped anyway for now

dkalinowski · 2025-04-16T10:44:44Z

src/test/llm/assisted_decoding_test.cpp

+    ASSERT_EQ(choice["message"]["content"].GetString(), expectedMessages[0]);
+}
+
+// Consider parametrization of negative tests with request body and endpoint as parameters


is there jira for that?

Well, I was thinking about doing that in scope of bigger task for general refactor of LLM tests

mzegla · 2025-04-17T13:38:29Z

Do not merge until fix is merged to GenAI

mzegla · 2025-04-25T15:11:03Z

Agreed to merge with limited testing for assisted generation
@dtrawins

mzegla force-pushed the prompt_lookup branch from 32b3ae0 to 5a5b7d4 Compare April 14, 2025 11:22

mzegla marked this pull request as ready for review April 14, 2025 15:13

mzegla requested review from michalkulakowski, dtrawins and dkalinowski April 15, 2025 08:53

mzegla commented Apr 15, 2025

View reviewed changes

dkalinowski reviewed Apr 16, 2025

View reviewed changes

src/llm/apis/openai_completions.cpp Show resolved Hide resolved

dkalinowski reviewed Apr 16, 2025

View reviewed changes

src/llm/apis/openai_completions.hpp Show resolved Hide resolved

dkalinowski reviewed Apr 16, 2025

View reviewed changes

michalkulakowski approved these changes Apr 16, 2025

View reviewed changes

dkalinowski approved these changes Apr 17, 2025

View reviewed changes

mzegla added the DO NOT MERGE label Apr 17, 2025

mzegla added 10 commits April 25, 2025 16:13

init

91c5b1a

style

c1fe923

add tests

db7f7e6

negative tests

d68c99e

Minor corrections

034a90c

further cleanup

c9c6ec9

skip in speculative decoding tests and try catch for draft model init

574a6a1

reduce timeout and uncomment relevant section

d1d88ad

udpate OV

15eeacf

update ov 24.04

297d546

mzegla force-pushed the prompt_lookup branch from a2399bf to 297d546 Compare April 25, 2025 14:14

temporary comment for greedy & assited output match

e151b31

mzegla removed the DO NOT MERGE label Apr 25, 2025

dtrawins added 2 commits April 26, 2025 00:01

switch from fork to upstream

76c0a7a

bump tokenizers

d78ed0b

dtrawins merged commit 04e4909 into main Apr 26, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable prompt lookup decoding #3234

Enable prompt lookup decoding #3234

mzegla commented Apr 11, 2025 •

edited

Loading

dkalinowski Apr 16, 2025

dkalinowski Apr 16, 2025

mzegla Apr 16, 2025

dkalinowski Apr 16, 2025 •

edited

Loading

mzegla Apr 16, 2025

mzegla Apr 16, 2025

dkalinowski Apr 16, 2025

mzegla Apr 16, 2025

dkalinowski Apr 16, 2025

mzegla Apr 16, 2025

mzegla commented Apr 17, 2025

mzegla commented Apr 25, 2025

Enable prompt lookup decoding #3234

Enable prompt lookup decoding #3234

Conversation

mzegla commented Apr 11, 2025 • edited Loading

🛠 Summary

🧪 Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dkalinowski Apr 16, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzegla commented Apr 17, 2025

mzegla commented Apr 25, 2025

mzegla commented Apr 11, 2025 •

edited

Loading

dkalinowski Apr 16, 2025 •

edited

Loading