-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Clustering crashes: ValueError("Columns must be same length as key") - too little input text maybe? #362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I was not able to reproduce the error you've faced. The process completed successfully on my end, and I can see the communities generated, with their summaries. In my setup, I use GPT4-o, otherwise it's based on the standard library. |
Ok will try again with GPT4-o! (The main idea is to test the capability to combine together stories told from different perspectives also “about” the central person, hence not always first person perspective, thanks for the input about checking the prompt though 👍🏻.) |
Seems to work now, maybe it was just my project that was in some weird state or the env variables having comments, other variables in .env or not names exactly right. Works with all the defaults incl default LLM. Thanks for the help! 💪🏻 |
So the short input was not a problem? Can you please list the changes you made from the non working config to produce the working config? You mentioned using gpt4o - was this a change you made? I'm getting this error trying to run with a locally hosted llama3 model. |
I had same issue and I was using vLLM like #357 |
environment: 在win上和mac上均创建管道失败,help me! please! |
Hi!
|
Here is the bash script I use now if of use to someone echo deleting ragtest folder echo creating ragtest folder echo copying input text file echo initializing index / creating project echo copying .env file with only GRAPHRAG_API_KEY= echo running pipeline |
Hello, I also encounter same error below
Parameters:
Then file at |
How did you solve this problem? |
is that method? |
Yes! |
try lower chunk size from 1200 to 300, and it works sucessfully. |
did anyone use llama3.2 and faced similar issue |
Hi!
I was able to reproduce the example at: https://microsoft.github.io/graphrag/posts/get_started/
However when I switch to use the exact same method but with some shorter fictional stories, it crashes during the clustering part.
Text input:
The input text is that I paste this into a txt: https://gist.github.com/simoncelinder/0fbb9aaebed1e21801ab6c6e11a0dda5
Error:

When then running the python -m graphrag.index --root ./ragtest I get (where my added printouts suggest that the cluster_graph function gets empty list input):
Maybe less relevant since downstream from this problem - inspecting the log files suggest shape mismatch:
So to reproduce this problem one would just put the text example in the input file (input/book.txt) and execute exactly as in the guide. I tried tweaking some params in settings.yaml as I assumed the problem was with the shorter input text, like various lengths, chunk sizes, max num clusters etc but without any luck so far.
My versions:
Any ideas? :-)
Thanks in advance and this seems like a really nice tool!
The text was updated successfully, but these errors were encountered: