GuessingGame: Measuring Informativeness of Open-Ended Questions in Large Language Models

By Elara Voss | 2025-09-26_01-55-33

GuessingGame: Measuring Informativeness of Open-Ended Questions in Large Language Models

Open-ended questions are the lifeblood of large language models (LLMs) in real-world use, from creative writing to complex reasoning tasks. Yet not all questions are equally informative. Some elicit broad, generic responses, while others spark concise, content-rich explanations that reveal the model’s understanding and the user’s intent. The GuessingGame approach offers a principled way to quantify how informative a question is by watching how much uncertainty the question reduces when the model provides an answer.

What is GuessingGame?

At its core, GuessingGame treats an open-ended prompt as a probe into a topic. Before the prompt is answered, you have a prior belief about the likely content that should emerge. After the model responds, you update that belief based on the output. The informativeness of the question is then measured by the information gain—the degree to which the answer narrows the space of plausible content.

Informativeness is not about right or wrong; it's about how much the answer shifts our understanding of what could be true about the topic.

Operationally, GuessingGame combines a prompt, a model-generated answer, and a secondary mechanism (a “guesser”) that tries to infer the intended content or the key facts the prompt sought to evoke. By comparing the prior and posterior distributions over content, we derive a structured metric that captures the prompt’s diagnostic power.

How to implement GuessingGame

Metrics you can use

Practical considerations and pitfalls

There are several caveats to keep in mind. First, informativeness is contextual: a prompt may be highly informative for a domain expert but less so for a lay audience. Second, leakage or prompt-priming effects can inflate apparent informativeness if the model’s response mirrors the guesser’s training data. Third, human evaluation remains essential to guard against metrics that reward style or verbosity over substantive content.

Designing better prompts with GuessingGame

When used in prompt design, GuessingGame helps you identify which prompts reliably extract precise, actionable knowledge. If a prompt consistently yields high information gain with concise responses, it’s a strong candidate for deployment in production workflows. Conversely, prompts with low information gain signal a need for rewording, added constraints, or a shift to a more focused prompt type.

Example scenario

Consider a prompt: “Explain the major factors contributing to ocean acidification in a way a non-expert can understand.” The LLM returns a structured explanation with several factors and a brief mechanism for how CO2 impacts seawater chemistry. A guesser then attempts to predict the key content the prompt aimed to evoke—factors such as CO2 dissolution, bicarbonate buffering, and ecosystem impacts. If the guesser’s top predictions align with the model’s content and the inferred content significantly narrows the possible set of factors, the information gain is high. If the answer is broad and repetitive, the guesser may still capture some factors, but the overall information gain will be lower, signaling a less informative prompt.

Why GuessingGame matters in practice

For teams building and evaluating LLM-based assistants, GuessingGame provides a transparent, quantitative lens on prompt design. It helps distinguish prompts that drive rich, content-rich outputs from those that merely produce surface-level text. In safety-sensitive or knowledge-critical applications, prioritizing high-informativeness prompts can improve reliability and reduce ambiguity in model behavior.

Final thoughts

In the evolving landscape of large language models, understanding what makes a question informative is as important as the answers themselves. GuessingGame offers a disciplined approach to measuring informativeness, aligning prompt engineering with measurable outcomes. As researchers and practitioners adopt this framework, we’ll gain sharper tools for crafting prompts that consistently unlock useful, trustworthy content from AI systems.