Were there any major changes after Milvus v2.4.15 that could have affected performance? #41169

Peter-YoungUk · 2025-04-09T02:43:21Z

Peter-YoungUk
Apr 9, 2025

I've tested various versions of Milvus using the same index options and approximately the same amount of data (although not exactly the same dataset).
The index was built using HNSW with the following parameters: m=48, efConstruction=800, efSearch=500, and limit=1.

However, I observed a significant latency gap compared to Milvus v2.4.15.
I reviewed the changes in versions v2.4.16 and v2.4.17 but couldn’t find any major modifications, except for a Knowhere version update (from v2.3.12 to v2.3.13):
v2.4.15...v2.4.17

Interestingly, all versions after v2.4.15 consistently show higher latency compared to v2.4.15 under the same indexing conditions.

Could you help me identify the cause of this performance regression?
I performed the tests after flushing and compacting the data, following the insertion of 4 million random vectors with 384 dimensions.
I also noticed that the newer versions consume more CPU compared to v2.4.15, even though all versions were tested under the same resource conditions.
Tested by locust with 10users. (using random search vector)

v2.4.15

v2.4.17

v2.5.8

table info

Answered by yhmo

Apr 11, 2025

Two million of vectors(dim=384), HNSW index(m=48, efConstruction=800, efSearch=500), and limit=1.
The test script:

for i in range(100):
    vector = gen_embedding()
    start = time.time()
    results = client.search(collection_name=collection_name,
                            data=[vector],
                            anns_field="vector",
                            limit=1,
                            consistency_level="Bounded",
                            search_params={"ef": 500})
    end = time.time()
    print("RPC search time cost {:.1f} ms".format((end - start) * 1000))

import requests
for i in range(100):
    try:
        data = {
            "data": [np.random.uniform(low=-128…

View full answer

yhmo · 2025-04-09T03:20:54Z

yhmo
Apr 9, 2025
Collaborator

Let me verify in my local, will let you know later.

0 replies

yhmo · 2025-04-09T08:35:01Z

yhmo
Apr 9, 2025
Collaborator

I used this script to test, seems the performance is not much different between v2.4.15 and v2.4.17

import random
import time

from pymilvus import (
    MilvusClient, DataType,
)

def gen_embedding():
    return [random.random() for _ in range(dim)]

client = MilvusClient(uri="http://localhost:19530")
print(client.get_server_version())

collection_name = "AAA"
dim = 384


schema = MilvusClient.create_schema(enable_dynamic_field=False)
schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="pkey", datatype=DataType.INT64)
schema.add_field(field_name="vector", datatype=DataType.FLOAT_VECTOR, dim=dim)

index_params = client.prepare_index_params()
index_params.add_index(
    field_name="vector",
    index_type="HNSW",
    metric_type="COSINE",
    params={"M": 48, "efConstruction": 800},
)

client.drop_collection(collection_name=collection_name)
client.create_collection(
    collection_name=collection_name,
    schema=schema,
    index_params=index_params,
)
print(collection_name, "created")

batch = 10000
for i in range(400):
    print("insert", i)
    data = [{"id": i * batch + k, "pkey": i * batch + k, "vector": gen_embedding()} for k in
            range(batch)]
    client.insert(collection_name=collection_name, data=data)

print("flushing...")
client.flush(collection_name=collection_name)

print("loading...")
client.load_collection(collection_name=collection_name)

result = client.query(collection_name, "", output_fields=["count(*)"])
print(f"final entities in {collection_name} is {result[0]['count(*)']}")

for i in range(100):
    vector = gen_embedding()
    start = time.time()
    results = client.search(collection_name=collection_name,
                            data=[vector],
                            anns_field="vector",
                            limit=1,
                            consistency_level="Bounded",
                            search_params={"ef": 500})
    end = time.time()
    print("search time cost {:.1f} ms".format((end - start) * 1000))

0 replies

Peter-YoungUk · 2025-04-10T01:35:12Z

Peter-YoungUk
Apr 10, 2025
Author

@yhmo
thanks for sharing your test code.
In my test results above, I got p99 (5ms), avg(1.5) on v2.5.15, but others reported p99(over 30ms) and avg(around 15ms).
Below is my Milvus configuration - I think it may have affected the test results, since a larger segment size tends to improve read performance (especially after compaction).
Do you mind sharing your test results?

  configuration:
    dataCoord:
      segment:
        maxSize: 4096
    queryNode:
      mmap:
        mmapEnabled: false

Here is the segment info. (All versions had almost the same document ratio.)

v2.4.15

v2.4.17

v.2.5.8

As for my test code, I didn't use PyMilvus because I had some issues using gRPC with Locust.

class SearchTest(FastHttpUser):
    @task
    def search_test(self):
        data = {
            "data": [np.random.uniform(low=-128, high=127, size=dimension).tolist()],
            "annsField": "vector",
            "limit": 1,
            "collectionName": table,
            "searchParams": {
                "params": { "ef" : 500 }
            },
            "consistencyLevel": consistency_level,
            "outputFields": [ "pkey" ]
        }
        with self.client.post(url="/v2/vectordb/entities/search", json=data, catch_response=True) as response:
            resp = msgspec.json.decode(response.content, type=Response)
            if resp.code != 0:
                logging.error(resp.message)
                response.failure(resp.code)

0 replies

yhmo · 2025-04-11T02:56:24Z

yhmo
Apr 11, 2025
Collaborator

Two million of vectors(dim=384), HNSW index(m=48, efConstruction=800, efSearch=500), and limit=1.
The test script:

for i in range(100):
    vector = gen_embedding()
    start = time.time()
    results = client.search(collection_name=collection_name,
                            data=[vector],
                            anns_field="vector",
                            limit=1,
                            consistency_level="Bounded",
                            search_params={"ef": 500})
    end = time.time()
    print("RPC search time cost {:.1f} ms".format((end - start) * 1000))

import requests
for i in range(100):
    try:
        data = {
            "data": [np.random.uniform(low=-128, high=127, size=dim).tolist()],
            "annsField": "vector",
            "limit": 1,
            "collectionName": collection_name,
            "searchParams": {
                "params": {"ef": 500}
            },
            "consistencyLevel": "Bounded",
            "outputFields": ["pkey"]
        }
        start = time.time()
        resp = requests.post(url="http://localhost:19530/v2/vectordb/entities/search", json=data, timeout=5)
        if resp.status_code != 200:
            raise Exception(f"failed to post url, status code: {resp.status_code}")
        end = time.time()
        print("RESTFUL search time cost {:.1f} ms".format((end - start) * 1000))
        # print(resp.json())
    except Exception as err:
        raise Exception(f"failed to post url, error: {err}")

On v2.4.15, the average RPC search latency is 4ms, and the average RESTful search latency is 13ms.

v2.4.15
RPC search time cost 66.4 ms
RPC search time cost 16.3 ms
RPC search time cost 16.3 ms
RPC search time cost 13.3 ms
RPC search time cost 13.3 ms
RPC search time cost 12.9 ms
RPC search time cost 12.8 ms
RPC search time cost 12.9 ms
RPC search time cost 13.2 ms
......

v2.4.15
RESTFUL search time cost 11.3 ms
RESTFUL search time cost 3.9 ms
RESTFUL search time cost 3.9 ms
RESTFUL search time cost 3.6 ms
RESTFUL search time cost 3.6 ms
RESTFUL search time cost 3.4 ms
RESTFUL search time cost 3.5 ms
RESTFUL search time cost 3.5 ms
RESTFUL search time cost 3.1 ms
RESTFUL search time cost 3.8 ms
......

On v2.4.17, the average RPC search latency is 13ms, and the average RESTful search latency is 13ms.

v2.4.17
RPC search time cost 86.4 ms
RPC search time cost 16.4 ms
RPC search time cost 16.6 ms
RPC search time cost 14.4 ms
RPC search time cost 13.5 ms
RPC search time cost 13.7 ms
RPC search time cost 13.4 ms
RPC search time cost 12.8 ms
RPC search time cost 12.4 ms
......

v2.4.17
RESTFUL search time cost 16.1 ms
RESTFUL search time cost 15.1 ms
RESTFUL search time cost 12.6 ms
RESTFUL search time cost 13.4 ms
RESTFUL search time cost 12.4 ms
RESTFUL search time cost 12.5 ms
RESTFUL search time cost 12.4 ms
RESTFUL search time cost 11.8 ms
RESTFUL search time cost 11.9 ms
......

The performance of RPC search is no difference between v2.4.15 and v2.4.17, which indicates the RPC latency is actual performance.

The latency of RESTFUL search on v2.4.15 is much better than v2.4.17. The reason is: on v2.4.15, the "searchParams" is not correctly passed by the RESTFUL search interface, the search engine used a default "ef" value to search, the default value of "ef" is much smaller than 500 so that it much faster than v2.4.17.
In the release note of v2.4.17, there is an improvement:

Added search parameters to search requests in RESTful API (#37673).
This pr is for this improvement: https://github.com/milvus-io/milvus/pull/37673/files#diff-dfdbe422c5ec11a2fdf7310b358e43f016fd714ab439d5d7f92f466298f7f408
In this pr, a "searchParams" is added to the SearchReqV2 to handle the search parameters:

So, in v2.4.15, although you have input the "ef=500", this parameter was not passed to the search engine, the latency you saw is not the actual latency of ef=500. The latency of v2.4.17 is the actual latency of ef=500.

1 reply

Peter-YoungUk Apr 11, 2025
Author

Ah, now my question is finally resolved.
I had assumed there were two possible causes:

ANN search wasn't working properly.
The build parameters weren't set correctly.

Turns out it was the first one.
Thanks for the detailed explanation!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Were there any major changes after Milvus v2.4.15 that could have affected performance? #41169

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Were there any major changes after Milvus v2.4.15 that could have affected performance? #41169

Peter-YoungUk Apr 9, 2025

Replies: 4 comments · 1 reply

yhmo Apr 9, 2025 Collaborator

yhmo Apr 9, 2025 Collaborator

Peter-YoungUk Apr 10, 2025 Author

yhmo Apr 11, 2025 Collaborator

Peter-YoungUk Apr 11, 2025 Author

Peter-YoungUk
Apr 9, 2025

Replies: 4 comments 1 reply

yhmo
Apr 9, 2025
Collaborator

yhmo
Apr 9, 2025
Collaborator

Peter-YoungUk
Apr 10, 2025
Author

yhmo
Apr 11, 2025
Collaborator

Peter-YoungUk Apr 11, 2025
Author