Skip to content

[Bug]: [json-inverted] The query returns unexpected results when using like % to match infix or suffix in the json value #41386

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
ThreadDao opened this issue Apr 17, 2025 · 3 comments
Assignees
Labels
kind/bug Issues or changes related a bug severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@ThreadDao
Copy link
Contributor

ThreadDao commented Apr 17, 2025

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.5-20250416-a89b611b-amd64
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

server

config:
    common:
      enabledGrowingSegmentJSONKeyStats: true
      enabledJsonKeyStats: true
    dataCoord:
      enableActiveStandby: true
      enabledJSONKeyStatsInSort: false
      jsonStatsTriggerCount: 10
      jsonStatsTriggerInterval: 10

client

        json_4:
          other_params:
            dataset: random_algorithm
            algorithm_params:
              algorithm_name: mixed_values_json
              json_key: 'text.value'
              json_value_types: ["varchar"]
              specify_range: [0, 100]
              varchar_prefix: "z"
              varchar_filled_length: 100
              max_capacity: 5
  1. insert 5m entities
    • id: range [0, 5m)
    • json_4: The data inserted into json_4 is of type string. Each value is generated by left-padding numbers in the range [0, 100) with the character 'z' to a total length of 100, and then converting the result into a string.
      for example:
c.query('id == 12 ', output_fields=["json_4"])
data: ["{'json_4': {'text': {'value': 'zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz12'}}, 'id': 12}"]
  1. flush -> index -> load
  2. concurrent upsert + flush + search
    • upsert data from pk 5000000, the json_4 object has 100 keys, each with the same value as pk, but as a string.
      example:
data: ["{'json_4': {'key_3': '5000000', 'key_8': '5000000', 'key_5': '5000000', 'extra_80': '5000000', 'key_9': '5000000', 'extra_60': '5000000', 'key_2': '5000000', 'key_0': '5000000', 'key_4': '5000000', 'key_6': '5000000', 'key_7': '5000000', 'key_1': '5000000'}, 'id': 5000000}"]

result

query with like z1%, %z1, %z1%

  • z1% return 0, expected 0 actual 0
  • %z1 return 0, expected 50000 actual 0
  • %z1% return 0, expected 550000 actual 0
c.query('json_4["text"]["value"] like "z1%"', output_fields=["count(*)"])
data: ["{'count(*)': 0}"]
c.query('json_4["text"]["value"] like "%z1"', output_fields=["count(*)"])
data: ["{'count(*)': 0}"]
c.query('json_4["text"]["value"] like "%z1%"', output_fields=["count(*)"])
data: ["{'count(*)': 0}"]

Expected Behavior

No response

Steps To Reproduce

https://argo-workflows.zilliz.cc/archived-workflows/qa/dd7b4df8-4dea-41f1-96ae-90db1e7bdcbc?nodeId=zong-json-key-cron-1744819200-2223764376

Milvus Log

pods:

zong-json-multi-op-54-6884-milvus-datanode-7558b9d6c-r4m2x        1/1     Running                  0               15h     10.104.17.5     4am-node23   <none>           <none>
zong-json-multi-op-54-6884-milvus-indexnode-dc7696cff-8dhfj       1/1     Running                  0               15h     10.104.21.151   4am-node24   <none>           <none>
zong-json-multi-op-54-6884-milvus-indexnode-dc7696cff-q4r7j       1/1     Running                  0               15h     10.104.27.111   4am-node31   <none>           <none>
zong-json-multi-op-54-6884-milvus-indexnode-dc7696cff-t8wd4       1/1     Running                  0               15h     10.104.25.148   4am-node30   <none>           <none>
zong-json-multi-op-54-6884-milvus-indexnode-dc7696cff-zbd68       1/1     Running                  0               15h     10.104.20.7     4am-node22   <none>           <none>
zong-json-multi-op-54-6884-milvus-mixcoord-66f99776cc-5vnsj       1/1     Running                  0               15h     10.104.24.250   4am-node29   <none>           <none>
zong-json-multi-op-54-6884-milvus-proxy-64fd69f965-qwdst          1/1     Running                  0               15h     10.104.24.251   4am-node29   <none>           <none>
zong-json-multi-op-54-6884-milvus-querynode-0-6f9d45887c-bk2tj    1/1     Running                  0               15h     10.104.21.152   4am-node24   <none>           <none>
zong-json-multi-op-54-6884-milvus-querynode-0-6f9d45887c-fdbhx    1/1     Running                  0               15h     10.104.25.149   4am-node30   <none>           <none>
zong-json-multi-op-54-6884-milvus-querynode-0-6f9d45887c-g8tzd    1/1     Running                  0               15h     10.104.6.199    4am-node13   <none>           <none>
zong-json-multi-op-54-6884-milvus-querynode-0-6f9d45887c-qptl6    1/1     Running                  0               15h     10.104.17.6     4am-node23   <none>           <none>

Anything else?

No response

@ThreadDao ThreadDao added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 17, 2025
@ThreadDao ThreadDao added the severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. label Apr 17, 2025
@ThreadDao ThreadDao added this to the 2.5.10 milestone Apr 17, 2025
@ThreadDao
Copy link
Contributor Author

the same issue as string array of json match:
json array data genaration:

        json_5:
          other_params:
            dataset: random_algorithm
            algorithm_params:
              algorithm_name: mixed_values_json
              json_key: 'array.value'
              json_value_types: ["array_int64", "array_varchar"]
              specify_range: [0, 100]
              varchar_prefix: "z"
              varchar_filled_length: 10
              max_capacity: 20
  • "z1%" expected 0 actual 0
  • "%z1" expected 25000 actual 0
  • "%z1%" expected 275000 actual 0
c.query('json_5["array"]["value"][0] like "z1%" ', output_fields=["count(*)"])
data: ["{'count(*)': 0}"]
c.query('json_5["array"]["value"][0] like "%z1" ', output_fields=["count(*)"])
data: ["{'count(*)': 0}"]
c.query('json_5["array"]["value"][0] like "%z1%" ', output_fields=["count(*)"])
data: ["{'count(*)': 0}"]

@JsDove
Copy link
Contributor

JsDove commented Apr 18, 2025

It has been fixed. support infix and suffux match types in JsonStats

@JsDove
Copy link
Contributor

JsDove commented Apr 18, 2025

/assign @ThreadDao

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 18, 2025
sre-ci-robot pushed a commit that referenced this issue Apr 18, 2025
fix: [2.5]support infix and suffix match types in JsonStats
issue:#41386
pr:#38039

Signed-off-by: Xianhui.Lin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants