Skip to content

[Bug]: In standalone mode, during a rolling upgrade, there was a service interruption in search/query lasting 11 seconds. #41393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
zhuwenxing opened this issue Apr 18, 2025 · 2 comments
Assignees
Labels
kind/bug Issues or changes related a bug test/rolling upgrade triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@zhuwenxing
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:v2.5.6--> 2.5-20250417-f2a55429-amd64
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):kafka    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior


[2025-04-17T13:29:50.657Z] testcases/test_single_request_operation_for_rolling_update.py:153: AssumptionFailure

[2025-04-17T13:29:50.657Z] >>	pytest.assume(rto <= rto_threshold,  f"{self.health_checkers[k].c_name} {k} rto expect {rto_threshold}s but get {rto}s, {self.health_checkers[k].fail_records}")

[2025-04-17T13:29:50.657Z] AssertionError: QueryChecker__CYwmEpU3 Op.query rto expect 10s but get 11.004984140396118s, [('failure', 354, '2025-04-17 13:08:22.776067', 1744895302.776109), ('success', 355, '2025-04-17 13:08:33.781059', 1744895313.7810931)]

[2025-04-17T13:29:50.657Z] assert False

[2025-04-17T13:29:50.657Z] FAILED testcases/test_single_request_operation_for_rolling_update.py::TestOperations::test_operations[9-10] - pytest_assume.plugin.FailedAssumption: 

[2025-04-17T13:29:50.657Z] 1 Failed Assumptions:

[2025-04-17T13:29:50.657Z] 

[2025-04-17T13:29:50.657Z] testcases/test_single_request_operation_for_rolling_update.py:153: AssumptionFailure

[2025-04-17T13:29:50.657Z] >>	pytest.assume(rto <= rto_threshold,  f"{self.health_checkers[k].c_name} {k} rto expect {rto_threshold}s but get {rto}s, {self.health_checkers[k].fail_records}")

[2025-04-17T13:29:50.657Z] AssertionError: SearchChecker__ae0ZuPj1 Op.search rto expect 10s but get 11.005936622619629s, [('failure', 345, '2025-04-17 13:08:23.183048', 1744895303.1831105), ('success', 346, '2025-04-17 13:08:34.189022', 1744895314.189047)]

Expected Behavior

No response

Steps To Reproduce

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/rolling_update_for_operator_test_simple/detail/rolling_update_for_operator_test_simple/6273/pipeline

log:

artifacts-kafka-standalone-6273-server-logs.tar.gz

cluster: 4am
ns: chaos-testing
pods

[2025-04-17T13:30:03.660Z] + kubectl get pods -o wide

[2025-04-17T13:30:03.662Z] + grep kafka-standalone-6273

[2025-04-17T13:30:03.917Z] kafka-standalone-6273-etcd-0                                      1/1     Running             0                  42m     10.104.20.11    4am-node22   <none>           <none>

[2025-04-17T13:30:03.917Z] kafka-standalone-6273-etcd-1                                      1/1     Running             0                  42m     10.104.19.131   4am-node28   <none>           <none>

[2025-04-17T13:30:03.917Z] kafka-standalone-6273-etcd-2                                      1/1     Running             0                  42m     10.104.21.51    4am-node24   <none>           <none>

[2025-04-17T13:30:03.917Z] kafka-standalone-6273-kafka-0                                     2/2     Running             2 (42m ago)        42m     10.104.15.211   4am-node20   <none>           <none>

[2025-04-17T13:30:03.917Z] kafka-standalone-6273-kafka-1                                     2/2     Running             2 (42m ago)        42m     10.104.20.12    4am-node22   <none>           <none>

[2025-04-17T13:30:03.917Z] kafka-standalone-6273-kafka-2                                     2/2     Running             2 (42m ago)        42m     10.104.19.134   4am-node28   <none>           <none>

[2025-04-17T13:30:03.917Z] kafka-standalone-6273-kafka-exporter-5988cb8468-6kljx             1/1     Running             4 (42m ago)        42m     10.104.26.42    4am-node32   <none>           <none>

[2025-04-17T13:30:03.917Z] kafka-standalone-6273-kafka-zookeeper-0                           1/1     Running             0                  42m     10.104.23.147   4am-node27   <none>           <none>

[2025-04-17T13:30:03.917Z] kafka-standalone-6273-kafka-zookeeper-1                           1/1     Running             0                  42m     10.104.21.53    4am-node24   <none>           <none>

[2025-04-17T13:30:03.917Z] kafka-standalone-6273-kafka-zookeeper-2                           1/1     Running             0                  42m     10.104.15.212   4am-node20   <none>           <none>

[2025-04-17T13:30:03.917Z] kafka-standalone-6273-milvus-standalone-d4b8d49d4-p9tsc           1/1     Running             0                  33m     10.104.32.134   4am-node39   <none>           <none>

[2025-04-17T13:30:03.917Z] kafka-standalone-6273-minio-0                                     1/1     Running             0                  42m     10.104.15.208   4am-node20   <none>           <none>

[2025-04-17T13:30:03.917Z] kafka-standalone-6273-minio-1                                     1/1     Running             0                  42m     10.104.26.44    4am-node32   <none>           <none>

[2025-04-17T13:30:03.917Z] kafka-standalone-6273-minio-2                                     1/1     Running             0                  42m     10.104.23.146   4am-node27   <none>           <none>

[2025-04-17T13:30:03.917Z] kafka-standalone-6273-minio-3                                     1/1     Running             0                  42m     10.104.19.133   4am-node28   <none>           <none>

Anything else?

No response

@zhuwenxing zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 18, 2025
@zhuwenxing zhuwenxing added this to the 2.5.10 milestone Apr 18, 2025
@yanliang567
Copy link
Contributor

/assign @weiliu1031
/unassign

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 18, 2025
@zhuwenxing
Copy link
Contributor Author

zhuwenxing commented Apr 18, 2025

search_timeout = 10
query_timeout = 10

According to the logs, there was a request failure that caused an interruption lasting 10 seconds.
('failure', 345, '2025-04-17 13:08:23.183048', 1744895303.1831105), ('success', 346, '2025-04-17 13:08:34.189022', 1744895314.189047)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug test/rolling upgrade triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants