Categories
Feature Problem solving Stories

Define Your Problem Brief or Define Your Failure: The Prerequisite for AI-Powered Iteration

When the Problem Was Never the Problem

Imagine a UK government department that spent £12 million on an AI system to predict welfare fraud. The system never went live. Not because of technical failure, the model worked fine in trials. The problem was never the model. The problem was that no one had bothered to specify what “fraud” actually meant in the context they were working in, how the system’s predictions would be used, what the acceptable false positive rate was, or who would be accountable when the system got it wrong. These weren’t technical questions. They were problem definition questions for a problem brief. And they were never answered in a problem.

Two years of development produced a technically sophisticated artifact that solved a problem no one had actually defined.

“Two years of development produced a technically sophisticated artifact that solved a problem no one had actually defined.”

This is not an unusual story. It is a predictable one.

The systemic failure wasn’t carelessness. The team was competent, the technology was sound, and the people involved were not stupid. The failure was structural: the incentives at every stage rewarded moving to solution mode quickly, and the cost of a vague problem statement was invisible until it was catastrophic. By the time the question “what are we actually trying to do?” became unavoidable, the budget was spent and the political commitment was already made.

This is the failure mode this piece is about.

————————

The Illusion of the Obvious Problem

Most knowledge workers are not taught to question problem statements. In school, problems come pre-packaged: here is an equation, solve it. In university, the frame is set by your supervisor or your brief. In the workplace, problems arrive in meetings, in strategy documents, in Slack messages from senior people who have already decided what the answer looks like.

The result is a professional culture that treats problem definition as optional, a formality to get through before the real work begins. The energy goes into solutions: the presentation, the prototype, the model, the proposal. Problem definition is paperwork. And paperwork doesn’t get reviewed by a steering committee.

This is the illusion of the obvious problem: the assumption that because a problem is familiar, it is understood. “Improve customer satisfaction” is obvious. “Reduce churn” is obvious. “Make the process more efficient” is obvious. These phrases feel like problems because they reference real discomforts. But they are not problems. They are problem areas – territories on a map that you have not surveyed.

The knowledge worker who jumps straight to solutions is not being lazy. They are responding rationally to the incentives around them. Speed is rewarded. Certainty is rewarded. Articulating what you do not know is punished, it looks like incompetence, or worse, like obstruction. The person who says “I’m not sure we’ve defined this correctly” is often right, but rarely popular.

The asymmetry is structural: the cost of an imprecise problem statement is paid later, by someone else, in a meeting you might not be invited to. The benefit of a quick start is immediate and visible. This is why the failure mode is systemic, not individual.

————————

Ill-Defined Problems and Well-Defined Problems

The distinction matters because it determines what kinds of solutions are even reachable.

An ill-defined problem has the following properties: the goal is vague, the boundaries are unclear, success is not measurable, and there are multiple equally valid interpretations of what “solving” it means. “Improve customer experience” is an ill-defined problem. There is no way to know when you have succeeded, because there is no explicit definition of success.

A well-defined problem has: a specific, bounded goal; measurable criteria for success; clear scope (what is explicitly not included); and known constraints on acceptable solutions. “Reduce cart abandonment rate for new users on the web checkout flow from 68% to below 50% within six months, by identifying and fixing the top three friction points in the payment step, while maintaining current average order value”, that is a well-defined problem.

The critical point is this: an ill-defined problem cannot be solved. It can only be responded to. You can produce a response to “improve customer experience.” You cannot solve it, because the problem does not have edges. When you hand an ill-defined problem to an iterative AI system (to Karpathy’s autoresearch, to an agentic workflow, to a model that generates and tests candidates) you are running a powerful search over the wrong space. You will find the best solution to the wrong problem, efficiently and confidently.

“The AI is a resolver, not a clarifier. It will resolve whatever you give it.”

Examples help:

Consulting: A client says “our culture is holding us back.” The consultant produces a culture audit, workshops, and a report. Twelve months later, nothing has changed. The problem was not cultural, it was that the two senior stakeholders who controlled the budget were in a quiet power struggle, and nothing could move until that was named. The problem presented and the problem actual were different problems.

Product: A product manager writes a ticket: “users can’t find the settings page.” The engineering team builds a new settings UI with better navigation. Usage of settings increases slightly. The actual problem, that users didn’t know they needed to change settings to get the product to work for them, was never addressed. The new UI didn’t fix the onboarding failure; it made the settings page easier to find from a broken starting point.

Research: A researcher is asked to “investigate why our grant application success rate has fallen.” They produce an analysis of peer institutions’ success rates, interview some colleagues, and conclude that the problem is the quality of applications. This was the answer the evidence most easily supported if you looked only at application quality, because the researcher had no access to strategic priority documents – and without those, the grant programme misalignment was invisible. The actual problem – that the institution’s strategic priorities had shifted without communicating to researchers which grant programmes now aligned with institutional goals, so researchers were applying to the wrong programmes – was never investigated, because it would have required admitting that institutional strategy had been unclear.

In each case, the problem presented and the problem actual were different. The solution was responsive to the presented problem. The outcome was failure to address the actual problem.

————————

Karpathy’s Autoresearch and the Input Problem

Andrej Karpathy has described a pattern he calls autoresearch: using large language models to autonomously iterate on a problem and generating candidate solutions, evaluating them, refining the approach, and repeating without continuous human intervention. The model effectively runs a search over the space of possible solutions, guided by a feedback signal. Do this enough times, and you converge on something that works.

This is a genuinely powerful idea. It is also, in important ways, a reinvention of something that already exists in engineering: the automated test suite. When you write a test, you are defining what good looks like. The test runs, the code changes, the test runs again. If the test is wrong, you get fast, confident failure. If the test is right, you get reliable progress.

The pattern here maps onto something engineers already understand. In the test suite analogy, the test is the problem definition. When the problem is well-specified, the feedback signal is precise. When the problem is vague, it’s as if your test suite has no assertions, the suite runs green while the code does something no one intended.

Karpathy’s insight is that LLMs can play the role of the test suite for problems that don’t have clean mathematical objective functions. You can describe the outcome you want, have the model generate approaches, evaluate them, and iterate. The model becomes an automated researcher.

Here is the catch, and it is a structural one: the feedback signal is only as good as the problem definition.

“Ill-defined problems don’t become well-defined just because you’ve handed them to an iterating system.”

If you specify the problem well, the autoresearch loop will explore the right solution space. You will get candidates, you will evaluate them, you will converge on something that actually addresses what you specified. The loop is powerful because it is rigorous within its constraints.

If you specify the problem poorly with vague goal, unclear success criteria, no defined boundaries then the loop will still run. The model will generate candidates, evaluate them (against your vague criteria), and converge. It will converge on the best solution to the wrong problem. It will do so efficiently, confidently, and at scale. Your wrong problem is now solved at machine speed.

This is not a hypothetical. Practitioners who have used LLMs for research assistance widely report a consistent experience: the model produces polished, plausible, well-structured output that is subtly but consequentially wrong. It solves the version of the problem that was easiest to represent in text, not the version that actually needed solving. The failure mode looks like hallucination. Usually, it is actually a poorly specified problem being resolved with high confidence.

The implication is not that AI systems are unreliable. The implication is that AI systems are reliable resolvers, and the question is whether you are giving them something worth resolving.

————————

Failure Modes: What Goes Wrong

Bad problem definition does not produce one kind of failure. It produces several, in different parts of the system.

The AI produces plausible nonsense. When the problem statement is vague, the model will generate responses that are well-structured, confident, and irrelevant. This is not a model failure. The model is doing exactly what it is designed to do, produce text that is coherent and responsive to the prompt. The prompt was not specific enough to constrain the response to the right space. The nonsense is a problem definition failure that has been resolved at speed.

Key insight: This is not hallucination. It is high-speed problem definition failure resolved at machine scale.

Projects stall. When no one has defined the problem, disagreement about what the project is trying to achieve shows up at every stage. The solution that Team A built does not satisfy Team B because they were solving a different version of the problem. The project enters an infinite loop of clarification that never resolves, because the underlying problem has never been named. The stall looks like poor governance. It is actually unresolved problem definition.

Teams converge on the wrong solution. When a problem is ambiguous, teams do not converge on the correct answer. They converge on the answer that the most powerful or most confident person believes. The loudest voice, the most senior title, the most plausible narrative, these determine which interpretation of the vague problem wins. The selected solution is then defended, because admitting it is wrong would require admitting that the problem was never correctly understood.

Resources are wasted on the wrong metrics. An organisation optimises for a proxy metric (say, customer satisfaction score) that was never validated as connected to the actual outcome desired. The metric improves. The outcome does not follow. The metric becomes entrenched because changing it would require acknowledging that the initial problem framing was wrong.

The failure gets normalised. This is the most insidious mode. The initial problem statement is treated as a given, not a choice. When the solution fails, the problem statement is never revisited. The failure is attributed to implementation, to politics, to budget – to anything but the fact that no one ever confirmed they were solving the right thing. Next time, the same pattern runs again.

————————

Failure Modes at a Glance

  • Plausible nonsense: Vague prompts produce confident, well-structured, irrelevant output
  • Project stalls: Ambiguity creates endless re-clarification loops
  • Wrong convergence: Ambiguous problems resolve to whoever is loudest or most senior
  • Wrong metrics: Proxy measures improve while actual outcomes do not follow
  • Normalised failure: Problem statement is never revisited; same pattern repeats

————————

The Constraint Specification Method

The solution is not to be more careful. It is to have a structure that makes carelessness visible.

Constraint specification is a method borrowed from engineering design and formal methods. The idea is simple: before you begin solving, you write down the boundaries of the problem. Not the solution. The problem. Specifically:

  1. Problem domain: What area or territory does this problem occupy?
  2. Boundaries: What is explicitly not part of this problem?
  3. Success criteria: What does a successful outcome look like? Be specific and measurable where possible.
  4. Known unknowns: What do you not know that you know you do not know?
  5. Assumptions: What are you assuming that might be wrong?
  6. Acceptable outputs: What kind of answer or artefact would actually be useful here?
  7. Constraints: What cannot change? What non-negotiables apply?
  8. Stakeholders and impact: Who is affected, and how?

The purpose of this structure is not to produce a perfect problem statement. It is to make your assumptions explicit. When your assumptions are visible, they can be questioned, tested, and corrected. When they are invisible, they operate unchecked and ambush you at the worst moment.

The known unknowns section is the most valuable and the most avoided. Writing “I don’t know what the conversion rate is for the current flow” feels like an admission. It is actually a statement of where you need to do work before the solution is meaningful. The alternative (proceeding without that knowledge) is not more professional. It is just less honest about the gap.

————————

The 8-Point Constraint Specification

Use this framework to make your assumptions explicit before you begin:

1. Problem Domain: What territory does this problem occupy?

2. Boundaries: What is explicitly not part of this problem?

3. Success Criteria: Specific, measurable outcomes. Be precise.

4. Known Unknowns: What do you know you don’t know?

5. Assumptions: What are you treating as given that might be wrong?

6. Acceptable Outputs: What would actually be useful to receive?

7. Constraints: What cannot change? What is non-negotiable?

8. Stakeholders & Impact: Who is affected, and how?

————————

A Practical Framework: The Problem Brief

Here is a template you can use today. Fill it in before you brief anyone, a colleague, a team, an AI system.

————————

The Problem Brief

Problem statement Who has a problem, doing what, with what outcome?

Example: New users abandon the checkout flow at the payment step at a rate of 68%, causing an estimated £340k annual revenue loss.

What this is NOT (boundaries – what are you explicitly not trying to solve?)

Example: This brief does not address pricing, product catalog quality, or marketing campaign performance.

Success criteria (specific, measurable, time-bound)

Example: Reduce abandonment rate at payment step from 68% to below 50% within 90 days, without reducing average order value.

Known unknowns (gaps in your knowledge that affect the problem)

Example: I don’t know whether the problem is friction in the payment UI, lack of trust signals, or delivery time expectations.

Assumptions (what you’re treating as given that might be wrong)

Example: I’m assuming the primary friction is the number of form fields; it might be the available payment methods.

Acceptable outputs (what would actually be useful to receive?)

Example: A prioritised list of the three highest-impact changes to the payment flow, with evidence for each.

Constraints (what can’t change?)

Example: Must maintain PCI compliance. Cannot change the payment processor. Mobile experience must remain identical.

Who is affected, and how?

Example: Users who reach payment and leave – potential customers who do not convert. The business loses revenue.

————————

Problem Brief

Problem statement (one sentence: who has a problem, doing what, with what outcome?)

_[Example: New users abandon the checkout flow at the payment step at a rate of 68%, causing an estimated £340k annual revenue loss._]

What this is NOT (boundaries – what are you explicitly not trying to solve?)

[Example: This brief does not address pricing, product catalog quality, or marketing campaign performance.]

Success criteria (specific, measurable; if you can’t measure it, describe what “done” looks like)

[Example: Reduce abandonment rate at payment step from 68% to below 50% within 90 days, without reducing average order value.]

What I know I don’t know (known unknowns – gaps in your knowledge that affect the problem)

[Example: I don’t know whether the problem is friction in the payment UI, lack of trust signals, or a combination with delivery time expectations.]

What I think I know that might be wrong (assumptions – what are you treating as given?)

[Example: I’m assuming the primary friction is the number of form fields; it might be the available payment methods.]

Acceptable outputs (what would actually be useful to receive?)

[Example: A prioritised list of the three highest-impact changes to the payment flow, with evidence for each.]

Constraints (what can’t change?)

[Example: Must maintain PCI compliance. Cannot change the payment processor. Mobile experience must be unchanged.]

Who is affected, and how?

[Example: Users who reach payment and leave — potential customers who do not convert. The business loses revenue. The product team has no visibility into this segment’s reasons for leaving.]

————————

The test of a good problem brief is simple: hand it to someone who was not in the room when you wrote it, and see if they come back with something useful. If they solve a different problem, the brief was not specific enough. That is useful information. Go back and sharpen it.

————————

The AI Integration

Once you have a problem brief, using an AI autoresearch system becomes much more tractable. Here is the workflow:

  1. Write the problem brief. Do this before you touch any AI tool.
  2. Feed the problem brief into the AI system. Use the problem statement, success criteria, and boundaries as the input specification.
  3. Set the evaluation criteria explicitly. Tell the system what good looks like in measurable terms.
  4. Review outputs against the problem brief, not against your intuition. The system will produce things that feel right. Check them against the brief. If they don’t match the brief, the system may have found a plausible alternative interpretation of an imprecise brief. Go back to step 1.
  5. Treat the AI as a candidate generator, not a decision-maker. The AI generates options. You evaluate them against the problem brief. The iteration loop is yours to steer.

The goal is not to use AI less. The goal is to use it in a context where it can succeed, which requires a well-defined problem as input.

A Worked Example: The Autoresearch Prompt

Below is a concrete prompt you can adapt for an AI autoresearch pipeline, using the Problem Brief template. Paste this into any capable LLM (Claude, ChatGPT, Gemini) and it will give you a structured output rather than polished nonsense.

————————

Prompt to paste:

You are an autonomous research analyst operating an iterative investigation loop. For each iteration, you will: (1) generate candidate hypotheses or explanations, (2) evaluate them against the evidence and constraints provided, (3) identify the strongest candidates, (4) refine or discard the weaker ones, and (5) repeat until convergence.

Here is my problem brief. Do not deviate from it.

Problem statement: New users abandon the checkout flow at the payment step at a rate of 68%. The business wants to reduce this to below 50% within 90 days, without reducing average order value.

What this is NOT: This investigation does not address pricing, product catalog quality, or marketing campaign performance. It is scoped to the logged-in web checkout flow only.

Success criteria: A reduction in abandonment rate at the payment step from 68% to below 50%, within 90 days, with no statistically significant reduction in average order value.

Known unknowns: I do not know whether the primary friction is in the payment UI, a lack of trust signals, available payment methods, delivery time expectations, or some combination of these.

Assumptions I’m treating as given (that might be wrong): I assume the primary friction is the number of form fields in the payment step. It might not be.

Acceptable outputs: A prioritised list of the three highest-impact changes to the payment flow, with evidence for each. For each change, I want: the specific modification, the evidence suggesting it will reduce abandonment, the estimated difficulty of implementation, and how I would measure whether it worked.

Constraints: PCI compliance must be maintained. The payment processor cannot be changed. The mobile experience must remain functionally identical to the desktop experience.

Iteration 1: Generate five candidate hypotheses for why users abandon at the payment step. For each, state: the specific friction or barrier it identifies, the evidence that supports it, and how you would test it. Rank them by likely impact.

Then, for each ranked hypothesis: What evidence would confirm it? What evidence would disprove it? What is the cheapest, fastest way to get that evidence?

Return your output as a structured table. Do not write a narrative essay.

————————

What to expect from this prompt: The LLM will return a structured table of ranked hypotheses with supporting evidence, test strategies, and disconfirmation criteria. It will look structured and precise because you asked it to be structured and precise. The quality of the output is entirely determined by the quality of the problem brief you fed it. If the prompt produces something vague, your problem statement was too vague.

The iteration loop: After you receive the first output, feed it back into the model with: “Now run iteration 2. For the top two hypotheses, generate specific experimental designs or analysis approaches that would confirm or disconfirm them. State what data you need and where it would come from.” Keep iterating until the output converges on something you can act on.

This is the autoresearch loop. It works because you defined the problem. Not because you trusted the AI.

————————

Tools and Further Reading

Problem definition is not a new discipline. The following frameworks are relevant at different stages of the process:

  • First Principles Thinking: Argue from underlying physics, not from analogy. Useful when you suspect you are reasoning by convention rather than from cause and effect. (Sommerville’s “The Engineering of Software Systems” is the formal treatment; Anderson’s popularisation is more accessible.)
  • 5 Whys: Ask why the problem exists, then why that exists, then why that exists. Stop when further answers do not add information. Useful for surfacing root causes beneath surface symptoms.
  • Assumption Mapping: Plot your assumptions on axes of uncertainty and importance. Identify which assumptions are both uncertain and critical. These are where you need evidence before proceeding.
  • Constraint Specification: Formalise the boundaries of your problem before searching for solutions within them. Particularly valuable in engineering, operations research, and AI system design.
  • Wicked Problems — Rittel & Webber (1973): The clearest academic treatment of why some problems resist systematic solving, and why the methods that work for tame problems fail for the messier ones that characterise most organisational challenges.

————————

Closing

The project that failed because the problem was never defined is not a cautionary tale about a bad team. It is a predictable output of a system that rewards fast starts and punishes premature questioning.

The discipline is not to be more careful. The discipline is to have a structure that makes your assumptions visible before you invest resources in solving the wrong problem.

Your next step is not to read more about problem definition. Your next step is to take a problem you are currently working on, one that is currently framed as “improve X” or “address Y”, and write a problem brief against the template above. One page. Thirty minutes.

Hand it to a colleague who was not in the room. See if they solve your problem or a different one.

“That is not a failure. That is the first iteration.”

————————

FailureHackers tracks the patterns that make projects predictably fail. If you’ve seen this pattern before (or if you’ve been the person in the room who quietly suspected the problem was never quite right) we want to hear from you.

Stay ahead of project failure

Join the Failure Hackers mailing list for tools, frameworks and analysis to help you spot the warning signs before it is too late.

Related posts: