Written by aka
on 2026-03-02

Sprouted: Hypothesis Trees as a Meta-Framework Above Spec-Driven Development

#thoughts #framework #sdd #software-design

Hey, it's aka. I came up with a new framework! So here it is.

Ideas are all about getting them out there first, right? So while there are still parts that need further consideration, I'm publishing it anyway. Starting from the position that "requirements, specs, and everything else are hypotheses," this framework manages decision-making through a recursive tree structure of Why/What/How. It also covers comparisons with SDD tools and relationships with GORE, HDD, and DR.

⚠️ Disclaimer: AI was used in writing this article. It was leveraged for structuring and prose refinement, while the design, judgment, and final review of the content were done by aka.

Sprouted: Hypothesis Trees as a Meta-Framework Above Spec-Driven Development

Introduction: Discomfort with Spec-Driven Development

In 2025, Spec-Driven Development (SDD) is gaining attention. Tools that "write specs first, then have AI generate code" are emerging one after another, and there's a growing call to move beyond "vibe coding."

This trend itself is right. Clarifying intent before implementation produces better quality than writing code from vague prompts. However, when actually using these tools, a certain discomfort lingers.

If requirements change, specs change too.

SDD tools say "treat specs as the source of truth." But if the requirements underlying those specs can change, how much meaning is there in treating specs as "settled"? Aren't requirements, specs, and designs all hypotheses?

This article defines this way of thinking as a framework called "Sprouted." Sprouted does not reject SDD; it is a meta-framework that provides a higher-level structure encompassing SDD.

Structural Problems of SDD Tools

Let's dig deeper into the discomfort with SDD. The major SDD tools as of 2025 all adopt a linear pipeline:

Requirements → Design → Tasks → Implementation
Constitution → Specify → Plan → Tasks → Implement
Spec (persistent source of truth) → Plan → Code generation

When actually using these tools, you run into several structural problems.

The boundaries between "requirements," "specifications," and "design" are ambiguous

This was the first thing that tripped me up when using SDD tools. Few people in practice can draw a clear line between where requirements end and specifications begin, or where design starts.

Is "Users can manage tasks" a requirement or a specification? Is "Users can add, complete, and list tasks" a specification or a design? SDD tools treat "specifications" as a privileged layer, but what counts as a specification causes confusion in practice.

The essence is that what you want to achieve and the means to achieve it can be separated at each layer — what you call each layer isn't essential. Yet SDD tools bind their processes to specific layer names like "requirements," "specifications," and "design."

Change tracking, rationale management, and granularity issues

Beyond naming issues, using SDD tools reveals several other issues:

You can't trace the impact of changes. When a spec changes, humans are left chasing down which designs and code are affected. There's also the risk that outdated specs mislead AI agents into generating implementations that don't match reality.
The rationale for "why this spec" disappears. Some tools generate user stories, but how those connect to the higher-level "why" is unmanaged. Since "why this spec exists" can't be structurally traced, the rationale for evaluating a spec's validity is easily lost.
Heavy processes run even for small fixes. Even small bug fixes get the same heavyweight process as large feature developments. The mechanism for adjusting process depth based on problem size is weak.

At the root of these problems lies a common cause: the assumption that "specs are settled things."

The Illusion of Certainty

Everything Is a Hypothesis

Traditional development processes have implicitly assumed things like "requirements are settled, specs change" or "if you nail down the specs, the code stabilizes." SDD tools similarly treat specs as the "source of truth" — a settled fact.

But is that really the case?

Is "there are people who forget their tasks and are struggling" actually true? Even if such people exist, can we say with certainty that "a TODO app is the optimal solution"? Requirements, specs, and code are all unverified "bets." In other words, every layer is a hypothesis.

When requirements change or specs change, teams tend to reflect that "we didn't manage things well enough." But if everything is a hypothesis, hypotheses collapsing is unavoidable. No matter how carefully you write them, they change when the underlying assumptions change. Requirements and specs are the same in this regard. The real problem is that when a hypothesis collapses, you can't see what is affected and to what extent.

The Gradient of Changeability

Of course, even if everything is a hypothesis, there are differences in how easily things change.

Abstract hypotheses (closer to Why): Tend to be stable. But the impact when they change is large.
Concrete hypotheses (closer to How): Change easily. But the impact is localized.

The traditional view that "requirements are stable, code is unstable" is actually a direct reflection of this property. However, traditionally, specific layers like "requirements" and "specifications" were given special treatment. If you treat all layers equally as hypotheses, the only difference becomes the gradient of changeability.

The impact when the topmost hypothesis collapses is enormous. That's precisely why you need a mechanism to structurally grasp which hypotheses you're standing on.

Sprouted was designed based on this recognition.

The Core of Sprouted: A Recursive Tree of Why / What / How

Sprouted is a framework that manages all decision-making as hypotheses through a recursive tree structure of Why/What/How. It discards layer names like "requirements," "specifications," and "design," treating all nodes with a unified structure of Why/What/How.

Basic Structure

In Sprouted, every development decision is managed as a node in a tree structure. Each node has three attributes:

Why (the reason): The premise for this node's existence. It has two aspects: motivation and constraints.
- Motivation: Why is this needed? Derived from the parent's What.
- Constraints: Under what assumptions and conditions are we thinking? Derived from the parent's How.
What (what to achieve): What you want to accomplish. An angle of resolution for the Why's motivation.
How (the means): The choice of method. A concrete approach to realize the What under the Why's constraints.

For users, you only need to think about three things: Why / What / How. However, being aware that Why can contain both motivation and constraints improves the quality of your Why.

Inter-Layer Connections: Two Lines Descend Through the Layers

The key to understanding Sprouted's recursive structure is that the inter-layer connections consist of two lines:

Parent's What → Child's Why motivation (chain of motivation): Digging deeper into "what to achieve" gives rise to the next layer's "why this is needed."
Parent's How → Child's Why constraints (chain of constraints): "Which means was chosen" determines the next layer's assumptions and constraints.

And within each node, motivation generates the What, and constraints narrow down the How options.

Let's look at this with a TODO app example.

graph TD
    subgraph "Layer 1"
        WM1["Why (motivation): People forget tasks and struggle"]
        WC1["Why (constraints): Targeting smartphone users / Solo dev scale"]
        What1["What: Enable recording and managing tasks"]
        How1["How: Build a TODO app"]
        WM1 --> What1
        WC1 --> How1
        What1 --> How1
    end

    subgraph "Layer 2"
        WM2["Why (motivation): Users regularly create, complete, and modify tasks"]
        WC2["Why (constraints): TODO app (Web/mobile) is the premise"]
        What2["What: Enable adding, completing, and listing tasks"]
        How2["How: Implement with React + SQLite"]
        WM2 --> What2
        WC2 --> How2
        What2 --> How2
    end

    subgraph "Layer 3"
        WM3["Why (motivation): List view becomes hard to read as items increase"]
        WC3["Why (constraints): React + SQLite is the premise. SQL ORDER BY/WHERE available"]
        What3["What: Enable sorting and filtering tasks by deadline and priority"]
        How3["How: Build a sort/filter API via query parameters"]
        WM3 --> What3
        WC3 --> How3
        What3 --> How3
    end

    What1 -.->|"Chain of motivation"| WM2
    How1 -.->|"Chain of constraints"| WC2
    What2 -.->|"Chain of motivation"| WM3
    How2 -.->|"Chain of constraints"| WC3

    style WM1 fill:#4a9e4a,color:#fff
    style WM2 fill:#4a9e4a,color:#fff
    style WM3 fill:#4a9e4a,color:#fff
    style WC1 fill:#8B4513,color:#fff
    style WC2 fill:#8B4513,color:#fff
    style WC3 fill:#8B4513,color:#fff
    style What1 fill:#3a7bd5,color:#fff
    style What2 fill:#3a7bd5,color:#fff
    style What3 fill:#3a7bd5,color:#fff
    style How1 fill:#e67e22,color:#fff
    style How2 fill:#e67e22,color:#fff
    style How3 fill:#e67e22,color:#fff

This chain continues as deep as needed. And every layer has the same structure. Whether you call Layer 1 "requirements," Layer 2 "specifications," or Layer 3 "design" is entirely up to you — the structure is identical.

Nodes as Hypothesis-Verification Cycles

Each node corresponds to a small cycle of "premise → hypothesis → verification → result."

Why → Premise (given this problem and these constraints)
What → What to verify (if we can achieve this, it should solve the problem)
How → Verification method (this is how we'll test it / build it)
Result → What actually happened

In other words, each node is a small hypothesis-verification cycle. With this perspective, development is no longer "implementing settled specs" but rather "a process of verifying hypotheses."

The Structure of Choices: 1-to-Many-to-Many

The relationship between Why and What, and between What and How, is one-to-many. The structure is Why 1 : What many : How many.

One Why can have multiple Whats, and each What can have multiple Hows.

graph TD
    W["🌱 Why (motivation): People forget tasks and struggle"]

    W --> WA["What 1: Enable recording and managing tasks"]
    W --> WB["What 2: Notify when task deadlines approach"]
    W --> WC["What 3: Automatically extract tasks from daily activities"]

    WA --> HA1["How: TODO app"]
    WA --> HA2["How: Sticky notes app"]
    WA --> HA3["How: Slack bot"]

    WB --> HB1["How: Push notifications"]
    WB --> HB2["How: Email reminders"]

    WC --> HC1["How: AI-based auto-extraction"]

    style W fill:#4a9e4a,color:#fff
    style WA fill:#3a7bd5,color:#fff
    style WB fill:#3a7bd5,color:#fff
    style WC fill:#3a7bd5,color:#fff
    style HA1 fill:#e67e22,color:#fff
    style HA2 fill:#e67e22,color:#fff
    style HA3 fill:#e67e22,color:#fff
    style HB1 fill:#e67e22,color:#fff
    style HB2 fill:#e67e22,color:#fff
    style HC1 fill:#e67e22,color:#fff

When the What changes, the entire set of How options changes. The How options for What 1 "record and manage" (TODO app, sticky notes app, Slack bot) and the How options for What 2 "notify" (push notifications, email reminders) are completely different sets.

In actual development, you choose one option from the candidates at each layer. The important thing is to record the options that were not chosen and the reasons why.

How to Write Why / What / How

Here are key points when writing each attribute:

For Why's motivation, write the "user or real-world premise" that emerges when you dig deeper into the parent's What. Don't make it a rephrasing of the parent's How. Not "to make it work as a TODO app," but rather "users regularly create, complete, and modify tasks." The former merely repeats the parent's How, while the latter is a real-world premise that emerges from digging deeper into the parent's What of "enable recording and managing."
For Why's constraints, write the assumptions and conditions that arise from choosing the parent's How. Choosing "TODO app" as the How means Web/mobile becomes the premise. Choosing "React + SQLite" means SQL ORDER BY is available. Multiple constraints can be combined into a single node.
Write constraints even for the first layer's Why. You might think constraints are empty since there's no parent node, but implicit assumptions exist above the root node. Business constraints, market conditions, resource constraints — they're in your head but just not made explicit. If these collapse, the entire tree is affected.
Derive What naturally from Why's motivation. Check for logical leaps. Since there can be multiple angles for What given a single Why, consider at least once whether "there's another angle." When there are multiple Whats, be conscious of whether they're all necessary (AND) or alternatives (OR).
For How, list multiple candidates before choosing. Rather than jumping on the first method that comes to mind, lay out options and then choose. Recording candidates that weren't chosen and their reasons makes re-evaluation easier when assumptions change later.

Change Propagation

If any of Why / What / How changes, there may be a need to review the subtree below it.

When Why's motivation changes. If the motivation "people forget tasks and struggle" itself collapses (e.g., it turns out users aren't actually forgetting tasks — they just can't prioritize), you may need to review the What, How, and all layers below.
When Why's constraints change. If the constraint "solo dev scale" changes to "team development," the How options and subsequent designs may change.
When What changes. If, with the same Why, you switch from What 1 "record and manage" to What 2 "notify," the set of How options changes as well.
When How changes. If, with the same Why and What, you switch from How "TODO app" to How "Slack bot," the constraints for the next layer change, and the child node considerations change accordingly.

graph TD
    subgraph "Before change"
        B_W["Why (motivation): Forgetting tasks is a problem"]
        B_W --> B_What["What: Enable recording and managing"]
        B_What --> B_How["How: TODO app ✅"]
        B_How --> B_C1["CRUD API design"]
        B_How --> B_C2["React UI design"]
        B_How --> B_C3["SQLite schema design"]
    end

    subgraph "After switching How"
        A_W["Why (motivation): Forgetting tasks is a problem"]
        A_W --> A_What["What: Enable recording and managing"]
        A_What --> A_How["How: Slack bot ✅"]
        A_How --> A_C1["Messaging API integration"]
        A_How --> A_C2["Rich menu design"]
        A_How --> A_C3["Conversation flow design"]
    end

    style B_W fill:#4a9e4a,color:#fff
    style B_What fill:#3a7bd5,color:#fff
    style B_How fill:#e67e22,color:#fff
    style B_C1 fill:#ddd,color:#333
    style B_C2 fill:#ddd,color:#333
    style B_C3 fill:#ddd,color:#333
    style A_W fill:#4a9e4a,color:#fff
    style A_What fill:#3a7bd5,color:#fff
    style A_How fill:#e67e22,color:#fff
    style A_C1 fill:#f9e79f,color:#333
    style A_C2 fill:#f9e79f,color:#333
    style A_C3 fill:#f9e79f,color:#333

The higher the node, the greater the impact when it changes; the lower the node, the more localized the impact. This is the concrete manifestation of the philosophy that "everything is a hypothesis, and the only difference is the gradient of changeability."

Confidence and Change Cost

If everything is a hypothesis, you need a mechanism to determine "which hypotheses should be verified first." In Sprouted, this is judged along two axes: confidence and change cost.

Confidence

Each node carries a numerical score from 0.0 to 1.0 (confidence score) representing "how much can we trust this hypothesis?" Confidence is measured by the question: "What is the strongest evidence supporting this hypothesis?"

Confidence propagates from children to parents. If child nodes have high confidence, they don't drag down the parent's confidence. However, even if all children are proven, the parent doesn't automatically become proven — the parent's own hypothesis could still be wrong.

Example: Confidence Scores and Propagation Calculation

The implementation of confidence scores is flexible, but as one example, here's a four-level scale borrowing from the philosophy of NASA TRL (Technology Readiness Level):

Level	Confidence Score	Definition	Example
Proven	0.8–1.0	Demonstrated in production with real data	Running in production for 3 months, KPIs met
Tested	0.5–0.7	Partially verified through PoC, user testing, etc.	Built a PoC and it worked, tested with 5 users
Reasoned	0.2–0.4	Backed by research and external cases. Not yet verified in your own context	Compared through technical research, consulted experts
Gut	0.0–0.1	Based only on intuition/experience. Unverified	"I think we should do it this way"

Propagation from children to parents can also be formalized. Here is one reference example:

C(parent) = C_self × C_children

C_self: The confidence in the parent node's own hypothesis. Independent of child states — how trustworthy is this hypothesis itself?
C_children: Calculated from child nodes. If children are in an AND relationship (all required), use min(child1, child2, ...); if in an OR relationship (any will do), use max(child1, child2, ...).

By using multiplication, if the hypothesis itself is wrong (C_self = 0), the parent becomes 0 no matter how well-proven the children are. For example, if a parent with C_self = 0.8 has two AND-related children (0.9 and 0.5), then 0.8 × min(0.9, 0.5) = 0.40, revealing that the child with confidence 0.5 is the bottleneck.

Change Cost and Prioritization

The advantage of a tree structure is that you can quantify the cost of change impact. When a node changes, the depth (how many layers down are affected) and breadth (how many nodes at the same layer are affected) of the affected subtree give you a sense of the change cost.

Confidence and change cost are two independent axes. Combining these two axes reveals which hypotheses should be verified first.

	Low Change Cost	High Change Cost
High Confidence	Stable. Leave it alone	Foundation. Hard to break, but devastating if it does
Low Confidence	Easy to experiment with	Dangerous. Verify this first

The most critical case is nodes with low confidence and high change cost. These are unverified hypotheses with a large number of assumptions stacked on top of them. When you find this combination, you should prioritize verifying that node above all other work.

How to Raise Confidence

The actions to raise confidence depend on the current score. Using the NASA TRL-inspired levels from earlier:

0.0–0.1 → 0.2–0.4 (adding rationale to intuition): Research similar cases, analyze competing products, find supporting evidence in technical blogs or papers. A few hours to a day.
0.2–0.4 → 0.5–0.7 (verifying in your own context): Implement a prototype or spike (time-boxed to 1–3 days), test with a few users, benchmark in a near-production environment. A few days to one sprint.
0.5–0.7 → 0.8–1.0 (proving in production): Deploy to production and measure metrics, confirm trends with real data over at least one week. One sprint to several weeks.

The important thing is that you don't need to get everything to 1.0. Hypotheses with low change cost can stay at low scores. Only intentionally raise the score for hypotheses with high change cost. Focusing verification effort only on the bottom-right quadrant of the 2x2 matrix is the practical approach.

One caveat, however: don't stack low-confidence hypotheses on top of other low-confidence hypotheses. Even if individual node uncertainty is small, when chained together, overall confidence drops multiplicatively. If you see nodes below 0.3 stacked two or more layers deep, that's a red flag. Without a hypothesis tree, you might not even notice this state.

How Sprouted Solves SDD's Problems

That covers the explanation of Sprouted. Here's how Sprouted provides structural solutions to the SDD problems raised at the beginning:

SDD Problem	Sprouted's Approach
Ambiguous boundaries between "requirements," "specifications," and "design"	Discard layer names; unify all nodes with Why/What/How
Specs treated as settled things	All nodes are hypotheses. Gradient management through confidence scores
Why not embedded in the structure	Each node has Why (motivation + constraints) / What / How, with Why as the starting point
Change propagation is manual	Tree structure automatically identifies impact scope as a subtree
No granularity flexibility	Granularity is naturally determined by tree depth. Free to dig deep or stay shallow

The important thing is that Sprouted does not reject SDD. SDD tools excel at "spec → code generation" — the transformation at the leaf-node level. Sprouted structurally manages what sits above that: "why are we writing this spec," "is this spec really correct," and "if the spec changes, what's affected."

When the leaf nodes of Sprouted's tree structure become sufficiently concrete Hows, they become equivalent to the "specs" that existing SDD tools consume. In other words, Sprouted doesn't replace existing SDD tools — it sits on top of them as a meta-framework.

Note that Sprouted's philosophy shares connections with existing academic approaches — the structural nature of Goal-Oriented Requirements Engineering (GORE), the hypothesis management of Hypothesis-Driven Development (HDD), and the decision recording of Design Rationale (DR). A detailed comparison with these is provided in Appendix B.

The Trap of Premature Unification

Having structurally solved SDD's problems, let's address one common trap in practice.

Developers have a bias toward unification. When they spot similar-looking functionality, they think "can't we combine these into one?" However, unifying things that merely look the same in How but differ in Why and What results in something half-baked for both purposes.

There are cases where multiple parents arrive at the same How. For example, "preventing forgotten tasks" and "visualizing team progress" — different Whys — might both arrive at "TODO app" as their How. However, if Why or What differs, the details of How will subtly differ too. Hasty unification produces something that serves neither purpose well. And since the Whys are different, when one Why changes, switching its subtree inadvertently affects the other tree.

As a typical failure pattern, consider a notes app that crammed "notes," "task management," and "document management" into a single app. Structuring this with Sprouted makes it clear that these are three different trees born from three different Whys: Why 1 "I don't want to forget things I think of," Why 2 "I want to organize my work," Why 3 "I want to compile materials." Each has different Whats and Hows. If this had been visualized, the question "should these really be one app?" could have been raised.

In Sprouted, the criteria for whether to unify are structurally determined:

If Why and What are the same, it's fine to unify the How.
If Why is the same but What differs, keep them separate by default.
If Why differs, manage them as separate trees even if the How looks similar.

Conversely, when similar nodes appear in different places, if Why / What / How all match completely, they are the same thing and can be reused. If even one doesn't match, manage them as separate nodes even if they look similar. If any of them changes and you want to switch only one side, unification causes collateral damage.

Sunk Costs

In theory, "swapping out an entire subtree" may be rational, but in reality it's not that simple. Design may have already progressed, code may have been written, and the team may be moving in that direction. Sunk costs are a real thing.

That's precisely why there's value in making the tree structure visible with Sprouted. When making the decision of whether to switch, being able to see "where we are now and what would be affected" enables structural judgment rather than emotional judgment. Do you swap everything? Can you reuse parts? How far does the impact reach? See all that, and then decide whether to "accept the sunk cost and switch" or "continue as is."

Sprouted is not a framework for "always making the right choice." Whether a choice was right can only be known after the fact. But if you can see what hypotheses you're standing on and what would be affected if something changes, you can do your best. Sprouted is a framework for that.

Outlook: Affinity with LLMs

Having covered Sprouted's structure and practical considerations, let's conclude by discussing its affinity with LLMs — the key to making this framework work in practice.

Sprouted is also a framework that only becomes practical with the advent of LLMs.

The Problem Design Rationale Couldn't Solve for 50 Years

Approaches structurally similar to Sprouted have actually existed before. Design Rationale (see Appendix B for details), proposed in 1970, is an approach to structurally recording "why this design was chosen," and its Question/Option/Criteria (QOC) structure is strikingly close to Sprouted's Why/What/How.

However, Design Rationale has failed to gain practical adoption for 50 years. The causes have been clearly analyzed:

High recording cost. Designers had to manually structure every decision.
It constrains design thinking. Many designers felt that structuring inhibited creativity.
Tools are separated from the development environment. Dedicated tools were required, never integrating into daily workflows.
Nobody reads it. The situations where painstakingly written rationale was actually referenced later were limited.

All of these failure causes stem from "humans manually doing the structuring."

LLMs Eliminate the Cost of Structuring

With the advent of LLMs, a solution to this 50-year-old challenge is coming into view.

Recording cost → LLMs automatically extract structure from natural language. Designers just write their thoughts in natural language, and the LLM handles classification into Why/What/How.
Constraints on design thinking → Write freely in natural language first, then the LLM proposes structuring afterward. Creativity is not inhibited.
Tool integration → LLMs operate within existing development environments and chat tools. No need to switch to dedicated tools.

In other words, LLMs have the potential to solve the "cost of structuring" problem that Design Rationale couldn't solve for 50 years. The claim that Sprouted is "a framework that only becomes practical with the advent of LLMs" is supported by this history.

Other LLM Applications

Structuring assistance. If users describe nodes in natural language, LLMs can provide guidance like "this description leans toward What" or "the Why motivation is vague." They can also detect logical leaps or cases where Why/What/How are mixed in a single node and propose separation.
Ambiguity detection and confidence updates. LLMs can detect that "this What is open to multiple interpretations" and reflect this in the confidence assessment.
Change impact analysis. When a parent node's hypothesis collapses, LLMs can estimate the impact on descendant nodes at the natural language level.
Duplicate detection and unification judgment. When similar nodes appear in different branches, LLMs can suggest "Why/What/How all match — should we unify?" or warn "the How is the same but the Why differs."
Connection to existing SDD tools. When leaf nodes become sufficiently concrete, they can be passed directly to existing SDD tools or AI coding agents.

Conclusion

The essence of Sprouted lies in the philosophy of managing all decision-making in software development as a "hypothesis tree."

Requirements, specs, designs, and code are all hypotheses. The only differences are the gradients of abstraction and changeability. Standing on this recognition, there is no need to give special treatment to any particular layer, and the same operations (decomposition, tracing, change impact analysis, hypothesis verification) can be applied to all layers.

The framework name "Sprouted" comes from the metaphor of the development process as "hypotheses sprouting from the seed of Why and growing into a tree." If the seed is good, the tree grows healthy; if the seed (Why) is wrong, no amount of carefully written specs will matter.

SDD tools made the "spec → code" transformation more efficient. Sprouted structurally manages why that spec exists, whether it's really correct, and what's affected when it changes. The two are not in opposition — combining them achieves a more robust development process.

All development begins with a single seed — a "why."

Appendix A: Sprouted Rules and Checklists

The rules of this framework are defined in three levels following RFC 2119:

MUST: The framework cannot function without this.
SHOULD: Improves quality, but may be omitted depending on circumstances.
MAY: Beneficial if done, but not a problem if not.

Rules on Node Structure

MUST: Each node must have Why / What / How. This is Sprouted's minimal unit. A node missing any one of these is incomplete, making the rationale or means untraceable.
MUST: What must be derived from the parent's Why motivation. A What without motivation cannot explain "why it's needed."
SHOULD: Write both motivation and constraints in Why. Consciously separating motivation (from the parent's What) and constraints (from the parent's How) improves the quality of your Why. However, strict separation is not mandatory.
SHOULD: Don't make Why's motivation a rephrasing of the parent's How. Not "to make it work as a TODO app" but "users regularly create, complete, and modify tasks."
SHOULD: Explicitly state constraints even for the first layer's Why. Business constraints, market conditions, resource constraints, etc. Unstated assumptions can't even be recognized when they collapse.
MAY: Standardize the description format for Why / What / How. When used by a team, establishing a writing template reduces ambiguity.

Rules on Layers and Decomposition

MUST: The parent's What generates the child's Why motivation, and the parent's How generates the child's Why constraints. This is the backbone of Sprouted's recursive structure.
SHOULD: Don't cram multiple concerns into a single node. If Whys differ, they should be separate nodes.
SHOULD: Match decomposition depth to problem size. A small bug fix doesn't need three layers of depth. Conversely, ending a large feature development at one layer is likely to miss considerations.
MAY: When leaf nodes become sufficiently concrete, delegate to existing SDD tools or coding agents.

Rules on Choices and Decision-Making

MUST: Consider multiple How candidates for each What. Even if only one candidate exists, think at least once about "is there another option?"
SHOULD: Record options not chosen and their reasons. This makes re-evaluation easier when assumptions change later.
SHOULD: When multiple Whats exist in parallel, specify whether all are required (AND) or are alternatives (OR). This affects the judgment of "can we give up this What?" during change impact analysis.
MAY: List multiple What candidates before choosing. Considering "is there another angle?" at the What level can lead to more essential solutions.

Rules on Hypothesis Management

MUST: Treat all nodes as hypotheses. There are no settled facts. Requirements, specs, and code all exist within a gradient of confidence.
SHOULD: Assign a confidence score (0.0–1.0) to each node. Actively verify nodes with low confidence. Prioritize verification especially when nodes with high change cost have low confidence.
SHOULD: Be aware of change cost (subtree depth x breadth). Confidence and change cost are independent axes; look at both to determine verification priority.
SHOULD: When an upper node changes, check the affected subtree. Even the manual habit of "the parent changed, so let's review the children" improves quality.
SHOULD: For important nodes, describe risks that could invalidate the hypothesis. Especially effective for nodes with low confidence or high change cost.
MAY: Record hypothesis verification results on nodes. Having states like "verified," "unverified," and "refuted" makes the health of the entire tree visible.
MAY: Describe non-functional requirements with Why / What / How as well. If achievement criteria are unclear, set the confidence lower.

Rules on Unification

MUST: Reuse is permitted only when Why / What / How all match. If even one differs, manage as separate nodes.
MUST: Do not unify things with different Whys. Even if the How looks the same, different Whys mean different trees.
SHOULD: When tempted to unify, first check the match level of Why / What / How. "Similar" and "the same" are different.
MAY: Even when unification isn't possible, you can decompose the How further and partially unify.

Checklist: When Creating a Node

☐ Is Why written? A node without motivation has no clear reason to exist.
☐ Is Why's motivation derived from the parent's What, not a rephrasing of the parent's How?
☐ Are Why's constraints explicitly stated? Especially easy to forget at the first layer.
☐ Is What naturally derived from the motivation? Any logical leaps?
☐ When there are multiple Whats, is it specified whether they're AND (all required) or OR (alternatives)?
☐ Were multiple How candidates considered? Did you think of at least two?
☐ Are the candidates not chosen and their reasons recorded?
☐ Are there multiple Whys mixed into one node? If so, split them.
☐ Is a confidence score set? Does it reflect the current verification state on a 0.0–1.0 scale?
☐ Has change cost (subtree depth x breadth) been checked? If confidence is low and change cost is high, prioritize verification.
☐ For important nodes, have risks that could invalidate the hypothesis been documented?

Checklist: When Changing a Node

☐ What changed? Why (motivation/constraints) / What / How — which one?
☐ Has the impact on child nodes been checked? If motivation changed, What is affected; if constraints changed, How is affected.
☐ Does the entire subtree need review, or just parts of it?
☐ Has the change cost been estimated? The depth and breadth of the affected subtree.
☐ Has a decision been made on whether to switch, taking sunk costs into account?
☐ If there are unified nodes, what's the impact on other usage points?
☐ Has the reason for the change been recorded? So "why did we change it" can be traced later.
☐ Have confidence scores been updated for affected nodes?

Appendix B: Academic Comparison with Existing Approaches

Sprouted's philosophy intersects with multiple fields in software engineering. Here we examine the relationships with three particularly relevant approaches — Goal-Oriented Requirements Engineering (GORE), Hypothesis-Driven Development (HDD), and Design Rationale (DR).

To state the conclusion upfront: Sprouted can be positioned as combining GORE's structural nature with HDD's hypothesis management, attempting to use LLMs to overcome the practical adoption barrier that DR couldn't solve for 50 years.

Goal-Oriented Requirements Engineering (GORE)

What Is GORE?

Goal-Oriented Requirements Engineering (GORE) is an approach that has been researched for over 20 years in the field of requirements engineering. It is a collective term for methods that elicit, model, and analyze requirements starting from goals. The representative frameworks are KAOS (Keep All Objectives Satisfied) and i* (iStar).

In KAOS, the top-level goal is progressively refined through AND/OR decomposition until it reaches a level that can be assigned to agents (humans or software). Each goal can be formally defined using temporal logic (LTL), and obstacle analysis — identifying factors that prevent goal achievement — is systematically performed. i* models dependency relationships between actors within an organization (who depends on whom for what), handling the dimensions of "why," "who," and "how."

Sprouted's recursive tree structure, WHY/HOW decomposition, and explicit management of alternatives are superficially similar to GORE's basic ideas. However, the two differ in their underlying epistemology and target different layers.

Epistemological Difference: Are Goals "Correct Things" or "Hypotheses"?

GORE takes the position that "goals are correct things elicited from stakeholders, and the job is to decompose them completely and formally." There is no mechanism to structurally manage the possibility that the goals themselves are wrong.

Sprouted takes the position that "all nodes are hypotheses, existing within a gradient of confidence." Even the topmost Why is a bet until verified, and when it collapses, the entire subtree needs review.

This difference becomes clear when attempting conversion. If you model a TODO app in KAOS, "Tasks are properly managed" becomes the starting goal. Trying to bring this into Sprouted, you can't write the Why. In KAOS, the goal itself is the starting point, but in Sprouted, the Why "people forget tasks and struggle" is the starting point, and the What (equivalent to the goal) is derived from it.

The reverse is the same — converting Sprouted nodes into KAOS causes the confidence, hypothesis management, and the two-line connection of motivation and constraints to disappear. KAOS has no concept of "goals might be wrong," so the core of Sprouted's hypothesis management structure has nowhere to go.

In other words, a clean bidirectional conversion between the two is not possible.

Layer Difference: Consistency Within the Tree vs. Validity of the Tree Itself

The epistemological difference is also a difference in target layers.

KAOS's formal verification defines goals in temporal logic, describes a system state transition model, and has a model checker explore all paths to report "there is a path where this goal cannot be reached." This is powerful, but what it can verify is only "gaps within the defined state space" — the enumeration of state variables and operations is done by humans. Gaps in the world that were left out of the model are fundamentally undetectable.

Sprouted starts from the premise that "the tree itself might be wrong." Even if KAOS perfects the consistency within the goal tree "manage tasks with a TODO app," if the user's real problem was "not forgetting" but "not being able to prioritize," that entire tree becomes meaningless. Sprouted supports the decision of "replacing the tree itself."

In other words, GORE verifies consistency within the tree, while Sprouted questions the validity of the tree itself. The two are not in opposition — they operate at different layers.

Incorporating GORE's Insights

Among GORE's strengths, those that naturally connect with Sprouted's philosophy have been incorporated as rules:

Obstacle analysis → Incorporated as a SHOULD rule. KAOS has a mechanism for systematically identifying obstacles to goal achievement. Since this connects naturally with Sprouted's "everything is a hypothesis" philosophy, a rule was added: "For important nodes, describe risks that could invalidate the hypothesis" (see "Rules on Hypothesis Management" in Appendix A).
AND/OR decomposition distinction → Incorporated as a SHOULD rule. In KAOS, subgoal groups are explicitly distinguished as "all required (AND)" or "alternatives (OR)." Since this matters for change impact analysis when judging "can we give up What 2 and still achieve the parent Why?", a rule was added to specify the AND/OR relationship of What groups (see "Rules on Choices and Decision-Making" in Appendix A).
Non-functional requirements → Clarified as handleable within the existing structure. GORE has the concept of softgoals (requirements that can't be fully satisfied but should be met to a sufficient level). In Sprouted, non-functional requirements can also be described with Why/What/How, and when achievement criteria are vague, the confidence should be set lower (see "Rules on Hypothesis Management" in Appendix A).

The following were placed out of scope:

Formal verification. KAOS's formal verification has the strength of completely exploring a defined state space. However, the cost of describing state transition models is high, and the validity of the model itself depends on human judgment. Sprouted's design principles prioritize natural language orientation and affinity with LLMs, placing it in a trade-off relationship with formal verification. However, if LLMs can eventually auto-generate state transition models from natural language, there is room for integration — partially applying formal verification to high-confidence nodes.
Actor/agent modeling. i* explicitly models "who depends on whom." This is an important perspective, but adding Who to the three attributes of Why/What/How increases the cognitive cost per node, contradicting the design principle of "reducing cognitive cost for practitioners." This is left as room for future extension.

Hypothesis-Driven Development (HDD)

What Is HDD?

Hypothesis-Driven Development (HDD) is an approach that directly applies Lean Startup thinking to the software development process. It replaces "requirements" with "hypotheses" and treats the development of new features and services as "a series of experiments."

A typical HDD process is as follows:

Write out assumptions.
Convert them into hypotheses. A commonly used template is: "We believe that [feature X] will result in [outcome Y]. We will know we have succeeded when [metric Z]."
Design an experiment (A/B test, prototype, user interview, etc.).
Run the experiment and analyze the results.
Decide whether to persevere or pivot.
Move to the next hypothesis.

Academically, the concept of "Hypotheses Engineering" has also been proposed, arguing that just as requirements engineering handles requirements, there is a need to elicit, document, analyze, and prioritize hypotheses.

Commonalities with Sprouted

HDD is the existing approach closest to Sprouted at the epistemological level.

"Everything is a hypothesis" epistemology. HDD also declares "We don't do projects anymore. Only experiments." This is the same position as Sprouted's "all nodes are hypotheses."
Hypothesis-verification cycle. HDD's "hypothesis → experiment → result → pivot or continue" roughly corresponds to Sprouted's "Why → What → How → result" hypothesis-verification cycle.
Rejection of requirements. HDD argues "requirements should be replaced with hypotheses," and Sprouted argues "specs are hypotheses." The direction is the same.

The Decisive Difference: The Presence or Absence of Structure

While sharing a common epistemology, there is a decisive difference between HDD and Sprouted:

HDD's hypotheses have no hierarchical structure. HDD's hypotheses are basically managed as a flat list. Without parent-child relationships between hypotheses, structural analysis of which other hypotheses are affected when one hypothesis collapses is impossible.
No separation of Why/What/How. HDD's hypothesis template "We believe that [X] will result in [Y]" mixes means (X) and outcomes (Y) in a single sentence. There is no mechanism like Sprouted's to separate Why's motivation and constraints, What, and How, and to track the chains of motivation and constraints.
No concept of change propagation. In HDD, when a hypothesis is refuted, you "pivot," but the analysis of what is affected and to what extent happens in people's heads.
The scope of application differs. HDD specializes in hypothesis verification for product outcome metrics (conversion rate, DAU, etc.). Sprouted focuses on the structuring of the more upstream questions: "what should we build in the first place" and "why should we build it."

Sprouted Can Be Positioned as a Structured Version of HDD

In summary, HDD shares the epistemology of "treating things as hypotheses" but lacks hypothesis structuring and change impact analysis. Sprouted's unique contribution lies in combining HDD's hypothesis management with GORE's structural nature.

Design Rationale (DR)

What Is DR?

Design Rationale is an approach to explicitly recording and managing the reasons behind design decisions. It originated with IBIS (Issue-Based Information System) developed by W.R. Kunz and Horst Rittel in 1970. Since then, multiple variants have been proposed, including QOC (Questions, Options, and Criteria) and DRL (Decision Representation Language).

DR's basic structure is strikingly close to Sprouted:

Design Rationale	Sprouted
Question / Issue / Decision	Close to What (what to achieve)
Option / Position / Alternative	Close to How (candidate means)
Criteria / Argument / Goal	Close to Why (motivation and constraints)
Recording of options not chosen	Recording of Hows not chosen and their reasons

Commonalities with Sprouted

Making the reasons for decisions explicit. DR's core is recording "why this design was chosen," which aligns with Sprouted's "Why is the starting point" philosophy.
Recording alternatives. DR emphasizes recording "options not chosen and their reasons." This directly corresponds to Sprouted's rule of "recording candidates not chosen and their reasons."
Hierarchical decomposition. In extended versions of IBIS, Issues are hierarchically decomposed. This corresponds to Sprouted's recursive tree structure.

Decisive Differences

DR has no hypothesis management. DR is an approach to "recording the rationale for decisions made" and has no mechanism to structurally manage the possibility that the decision itself is wrong. There is no concept of confidence, nor the epistemological recognition that "this decision is a hypothesis and a bet until verified."
No structure for change propagation. DR records the rationale for individual decisions but has weak mechanisms for tracking, as a tree structure, which other decisions are affected when one decision changes.
DR has a stronger character of post-hoc recording. In practice, it often ends up with rationale being written after design is finished. Sprouted assumes building the tree before design and proceeding with design based on the tree.

The 50-year history of DR's failure to gain adoption and the possibility of LLMs overcoming this were discussed in the "Outlook: Affinity with LLMs" section of the main text.

Sprouted's Position

Organizing the relationships with the three existing approaches reveals the position Sprouted fills:

Borrowed from GORE: Structural nature. Goal tree decomposition, AND/OR decomposition, obstacle analysis. But the epistemology differs.
Shared with HDD: Epistemology. The position that "everything is a hypothesis" and "hypotheses, not requirements." But HDD lacks structure.
Shared with DR: Decision-making structure. Separation of Why/What/How ≈ Question/Option/Criteria. But DR lacks hypothesis management and failed to gain practical adoption.

Sprouted can be positioned as integrating GORE's structural nature × HDD's hypothesis management × DR's decision recording, attempting to use LLMs to overcome the practical adoption barrier that DR couldn't solve for 50 years.

Repository (Added 2026/03/20)

The Sprouted framework definition (SKILL.md), system prompt for AI assistants, template for new projects, and Sprouted's own hypothesis tree (a real-world example) are publicly available.

👉 gitlab.com/akapersonal/sprouted

Loading comments...