Skip to content

[Bug]: Handling text without any entities and relationships #1881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 3 tasks
basitanees opened this issue Apr 15, 2025 · 1 comment
Open
1 of 3 tasks

[Bug]: Handling text without any entities and relationships #1881

basitanees opened this issue Apr 15, 2025 · 1 comment
Labels
backlog We've confirmed some action is needed on this and will plan it

Comments

@basitanees
Copy link

basitanees commented Apr 15, 2025

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

When the text is as simple as "Hello, world". We may not be able to extract any entities or relationships. This currently throws error as relevant keys are not extracted. Could we set a default value in such cases?

Steps to reproduce

Use a simple sentence like "Hello world"

Expected Behavior

2025-04-15 07:57:54,494|ERROR|graphrag.index.run.run_pipeline:156:error running workflow extract_graph
Traceback (most recent call last):
  File "/home/abdul/data/GraphRAG/src/graphrag/index/run/run_pipeline.py", line 143, in _run_pipeline
    result = await workflow_function(config, context)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/abdul/data/GraphRAG/src/graphrag/index/workflows/extract_graph.py", line 46, in run_workflow
    entities, relationships = await extract_graph(
                              ^^^^^^^^^^^^^^^^^^^^
  File "/home/abdul/data/GraphRAG/src/graphrag/index/workflows/extract_graph.py", line 82, in extract_graph
    extracted_entities, extracted_relationships = await extractor(
                                                  ^^^^^^^^^^^^^^^^
  File "/home/abdul/data/GraphRAG/src/graphrag/index/operations/extract_graph/extract_graph.py", line 133, in extract_graph
    relationships = _merge_relationships(relationship_dfs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/abdul/data/GraphRAG/src/graphrag/index/operations/extract_graph/extract_graph.py", line 170, in _merge_relationships
    .agg(
     ^^^^
  File "/home/ubuntu/miniconda3/envs/graph/lib/python3.12/site-packages/pandas/core/groupby/generic.py", line 1432, in aggregate
    result = op.agg()
             ^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/graph/lib/python3.12/site-packages/pandas/core/apply.py", line 190, in agg
    return self.agg_dict_like()
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/graph/lib/python3.12/site-packages/pandas/core/apply.py", line 423, in agg_dict_like
    return self.agg_or_apply_dict_like(op_name="agg")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/graph/lib/python3.12/site-packages/pandas/core/apply.py", line 1608, in agg_or_apply_dict_like
    result_index, result_data = self.compute_dict_like(
                                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/graph/lib/python3.12/site-packages/pandas/core/apply.py", line 462, in compute_dict_like
    func = self.normalize_dictlike_arg(op_name, selected_obj, func)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/graph/lib/python3.12/site-packages/pandas/core/apply.py", line 663, in normalize_dictlike_arg
    raise KeyError(f"Column(s) {list(cols)} do not exist")
KeyError: "Column(s) ['description', 'source_id', 'weight'] do not exist"

GraphRAG Config Used

llm: gpt 4o
embedding_model: text-embedding-ada-002

models:
  default_chat_model:
    type: openai_chat
    auth_type: api_key
    api_key: <REDACTED>
    model: ${LLM_ID}
    model_supports_json: false
    concurrent_requests: 25
    async_mode: threaded
    retry_strategy: native
    max_retries: -1
    tokens_per_minute: 0
    requests_per_minute: 0
    api_base: ${BASE_URL}
    encoding_model: o200k_base
  default_embedding_model:
    type: openai_embedding
    auth_type: api_key
    api_key: <REDACTED>
    model: ${BATCH_EMBEDDING_MODEL_ID}
    model_supports_json: false
    concurrent_requests: 25
    async_mode: threaded
    retry_strategy: native
    max_retries: -1
    tokens_per_minute: 0
    requests_per_minute: 0
    api_base: ${BASE_URL}
    encoding_model: cl100k_base
input:
  type: blob
  file_type: csv
  base_dir: input
  container_name: test002
  connection_string: <REDACTED>;
  metadata: []
chunks:
  size: 1200
  overlap: 100
  group_by_columns:
  - id
  encoding_model: o200k_base
  prepend_metadata: true
  chunk_size_includes_metadata: true
output:
  type: blob
  base_dir: output
  container_name: test002
  connection_string: <REDACTED>
cache:
  type: blob
  base_dir: cache
  container_name: test002
  connection_string: <REDACTED>
reporting:
  type: blob
  base_dir: logs
  container_name: test002
  connection_string: <REDACTED>
vector_store:
  default_vector_store:
    type: cosmosdb
    connection_string: <REDACTED>
    url: <REDACTED>
    api_key: <REDACTED>
    database_name: graphrag-evaluation
    vector_size: 1536
    collection_name: test002
    container_name: test002
    overwrite: true
embed_text:
  model_id: default_embedding_model
  vector_store_id: default_vector_store
extract_graph:
  model_id: default_chat_model
  prompt: prompts/extract_graph.txt
  entity_types:
  - organization
  - person
  - geo
  - event
  max_gleanings: 1
summarize_descriptions:
  model_id: default_chat_model
  prompt: prompts/summarize_descriptions.txt
  max_length: 500
extract_graph_nlp:
  text_analyzer:
    extractor_type: regex_english
cluster_graph:
  max_cluster_size: 10
extract_claims:
  enabled: true
  model_id: default_chat_model
  prompt: prompts/extract_claims.txt
  description: Any claims or facts that could be relevant to information discovery.
  max_gleanings: 1
community_reports:
  model_id: default_chat_model
  graph_prompt: prompts/community_report_graph.txt
  text_prompt: prompts/community_report_text.txt
  max_length: 2000
  max_input_length: 8000
embed_graph:
  enabled: false
umap:
  enabled: false
snapshots:
  graphml: false
  embeddings: false
local_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: prompts/local_search_system_prompt.txt
global_search:
  chat_model_id: default_chat_model
  map_prompt: prompts/global_search_map_system_prompt.txt
  reduce_prompt: prompts/global_search_reduce_system_prompt.txt
  knowledge_prompt: prompts/global_search_knowledge_system_prompt.txt
drift_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: prompts/drift_search_system_prompt.txt
  reduce_prompt: prompts/drift_search_reduce_prompt.txt
basic_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: prompts/basic_search_system_prompt.txt

Logs and screenshots

No response

Additional Information

  • GraphRAG Version: 2.1.0
  • Operating System: Linux
  • Python Version: 3.12
  • Related Issues:
@basitanees basitanees added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Apr 15, 2025
@natoverse natoverse added backlog We've confirmed some action is needed on this and will plan it and removed bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Apr 18, 2025
@natoverse
Copy link
Collaborator

I'll see if we can put a fallback in for these scenarios

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog We've confirmed some action is needed on this and will plan it
Projects
None yet
Development

No branches or pull requests

2 participants