[Bug]: Handling text without any entities and relationships #1881

basitanees · 2025-04-15T08:05:12Z

Do you need to file an issue?

I have searched the existing issues and this bug is not already filed.
My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

When the text is as simple as "Hello, world". We may not be able to extract any entities or relationships. This currently throws error as relevant keys are not extracted. Could we set a default value in such cases?

Steps to reproduce

Use a simple sentence like "Hello world"

Expected Behavior

2025-04-15 07:57:54,494|ERROR|graphrag.index.run.run_pipeline:156:error running workflow extract_graph
Traceback (most recent call last):
  File "/home/abdul/data/GraphRAG/src/graphrag/index/run/run_pipeline.py", line 143, in _run_pipeline
    result = await workflow_function(config, context)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/abdul/data/GraphRAG/src/graphrag/index/workflows/extract_graph.py", line 46, in run_workflow
    entities, relationships = await extract_graph(
                              ^^^^^^^^^^^^^^^^^^^^
  File "/home/abdul/data/GraphRAG/src/graphrag/index/workflows/extract_graph.py", line 82, in extract_graph
    extracted_entities, extracted_relationships = await extractor(
                                                  ^^^^^^^^^^^^^^^^
  File "/home/abdul/data/GraphRAG/src/graphrag/index/operations/extract_graph/extract_graph.py", line 133, in extract_graph
    relationships = _merge_relationships(relationship_dfs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/abdul/data/GraphRAG/src/graphrag/index/operations/extract_graph/extract_graph.py", line 170, in _merge_relationships
    .agg(
     ^^^^
  File "/home/ubuntu/miniconda3/envs/graph/lib/python3.12/site-packages/pandas/core/groupby/generic.py", line 1432, in aggregate
    result = op.agg()
             ^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/graph/lib/python3.12/site-packages/pandas/core/apply.py", line 190, in agg
    return self.agg_dict_like()
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/graph/lib/python3.12/site-packages/pandas/core/apply.py", line 423, in agg_dict_like
    return self.agg_or_apply_dict_like(op_name="agg")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/graph/lib/python3.12/site-packages/pandas/core/apply.py", line 1608, in agg_or_apply_dict_like
    result_index, result_data = self.compute_dict_like(
                                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/graph/lib/python3.12/site-packages/pandas/core/apply.py", line 462, in compute_dict_like
    func = self.normalize_dictlike_arg(op_name, selected_obj, func)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/graph/lib/python3.12/site-packages/pandas/core/apply.py", line 663, in normalize_dictlike_arg
    raise KeyError(f"Column(s) {list(cols)} do not exist")
KeyError: "Column(s) ['description', 'source_id', 'weight'] do not exist"

GraphRAG Config Used

llm: gpt 4o
embedding_model: text-embedding-ada-002

models:
  default_chat_model:
    type: openai_chat
    auth_type: api_key
    api_key: <REDACTED>
    model: ${LLM_ID}
    model_supports_json: false
    concurrent_requests: 25
    async_mode: threaded
    retry_strategy: native
    max_retries: -1
    tokens_per_minute: 0
    requests_per_minute: 0
    api_base: ${BASE_URL}
    encoding_model: o200k_base
  default_embedding_model:
    type: openai_embedding
    auth_type: api_key
    api_key: <REDACTED>
    model: ${BATCH_EMBEDDING_MODEL_ID}
    model_supports_json: false
    concurrent_requests: 25
    async_mode: threaded
    retry_strategy: native
    max_retries: -1
    tokens_per_minute: 0
    requests_per_minute: 0
    api_base: ${BASE_URL}
    encoding_model: cl100k_base
input:
  type: blob
  file_type: csv
  base_dir: input
  container_name: test002
  connection_string: <REDACTED>;
  metadata: []
chunks:
  size: 1200
  overlap: 100
  group_by_columns:
  - id
  encoding_model: o200k_base
  prepend_metadata: true
  chunk_size_includes_metadata: true
output:
  type: blob
  base_dir: output
  container_name: test002
  connection_string: <REDACTED>
cache:
  type: blob
  base_dir: cache
  container_name: test002
  connection_string: <REDACTED>
reporting:
  type: blob
  base_dir: logs
  container_name: test002
  connection_string: <REDACTED>
vector_store:
  default_vector_store:
    type: cosmosdb
    connection_string: <REDACTED>
    url: <REDACTED>
    api_key: <REDACTED>
    database_name: graphrag-evaluation
    vector_size: 1536
    collection_name: test002
    container_name: test002
    overwrite: true
embed_text:
  model_id: default_embedding_model
  vector_store_id: default_vector_store
extract_graph:
  model_id: default_chat_model
  prompt: prompts/extract_graph.txt
  entity_types:
  - organization
  - person
  - geo
  - event
  max_gleanings: 1
summarize_descriptions:
  model_id: default_chat_model
  prompt: prompts/summarize_descriptions.txt
  max_length: 500
extract_graph_nlp:
  text_analyzer:
    extractor_type: regex_english
cluster_graph:
  max_cluster_size: 10
extract_claims:
  enabled: true
  model_id: default_chat_model
  prompt: prompts/extract_claims.txt
  description: Any claims or facts that could be relevant to information discovery.
  max_gleanings: 1
community_reports:
  model_id: default_chat_model
  graph_prompt: prompts/community_report_graph.txt
  text_prompt: prompts/community_report_text.txt
  max_length: 2000
  max_input_length: 8000
embed_graph:
  enabled: false
umap:
  enabled: false
snapshots:
  graphml: false
  embeddings: false
local_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: prompts/local_search_system_prompt.txt
global_search:
  chat_model_id: default_chat_model
  map_prompt: prompts/global_search_map_system_prompt.txt
  reduce_prompt: prompts/global_search_reduce_system_prompt.txt
  knowledge_prompt: prompts/global_search_knowledge_system_prompt.txt
drift_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: prompts/drift_search_system_prompt.txt
  reduce_prompt: prompts/drift_search_reduce_prompt.txt
basic_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: prompts/basic_search_system_prompt.txt

Logs and screenshots

No response

Additional Information

GraphRAG Version: 2.1.0
Operating System: Linux
Python Version: 3.12
Related Issues:

The text was updated successfully, but these errors were encountered:

natoverse · 2025-04-18T18:00:39Z

I'll see if we can put a fallback in for these scenarios

basitanees added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Apr 15, 2025

natoverse added backlog We've confirmed some action is needed on this and will plan it and removed bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Apr 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Handling text without any entities and relationships #1881

[Bug]: Handling text without any entities and relationships #1881

basitanees commented Apr 15, 2025 •

edited

Loading

natoverse commented Apr 18, 2025

[Bug]: Handling text without any entities and relationships #1881

[Bug]: Handling text without any entities and relationships #1881

Comments

basitanees commented Apr 15, 2025 • edited Loading

Do you need to file an issue?

Describe the bug

Steps to reproduce

Expected Behavior

GraphRAG Config Used

Logs and screenshots

Additional Information

natoverse commented Apr 18, 2025

basitanees commented Apr 15, 2025 •

edited

Loading