Information about multi-hop reasoning
- Multi-hop Datasets
- Interpretability & Latent Multi-hop Reasoning
- Analyses Papers
- Knowledge Editing
- Multilingual
Year | Dataset | Answer style | Domain/Corpus | Question source | Note |
---|---|---|---|---|---|
2018 | WikiHop | MC | Wikipedia | Automated | Question is in triple form |
2018 | MedHop | MC | Medline | Automated | Question is in triple form |
2018 | ComplexWebQues | Extr | Web snippets | Automated & Crowd | |
2018 | HotpotQA | Extr. & Yes/no | Wikipedia | Crowd | Sentence-level explanation infor |
2020 | R4C | Extr. & Yes/no | Wikipedia | Crowd | Entity-level explanation |
2020 | 2WikiMultiHopQA | Extr. & Yes/no | Wikipedia | Automated | Sentence-level + Entity-level |
2021 | StrategyQA | Yes/no | Wikipedia | Crowd | Decomposed steps |
2022 | MuSiQue | Extr | Wikipedia | Automated & Crowd | Sentence-level + Entity-level |
2023 | Bamboogle | Extr | Crowd | Without context | |
2023 | Compositional Celebrities | Extr | Wikipedia | Automated | Without context |
- Taken out of context: On measuring situational awareness in LLMs
- Physics of language models: Part 3.2, knowledge manipulation
- Do large language models latently perform multi-hop reasoning?
- Towards a theoretical understanding of the ’reversal curse’ via training dynamics
- Grokked transformers are implicit reasoners: A mechanistic journey to the edge of generalization
- Looking inward: Language models can learn about themselves by introspection
- The Two-Hop Curse: LLMs trained on A->B, B->C fail to learn A-->C
- Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
- Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries
- Compositional Questions Do Not Necessitate Multi-hop Reasoning
- Understanding Dataset Design Choices for Multi-hop Reasoning
- Multi-hop Question Answering via Reasoning Chains
- Avoiding Reasoning Shortcuts: Adversarial Evaluation, Training, and Model Development for Multi-Hop QA
- Is Multihop QA in DiRe Condition? Measuring and Reducing Disconnected Reasoning
- Do Multi-Hop Question Answering Systems Know How to Answer the Single-Hop Sub-Questions?
- Measuring and narrowing the compositionality gap in language models