Skip to the content.

Intervention Scoring Prompt

This is the methodology used to score every intervention in the Worthwhile database. The prompt is fed to a Claude model, which produces a YAML record with PBS / ISR / UAR scores, baseline framing, sources, and content fields. Every intervention page links back to the version of this prompt that produced it via a commit-pinned URL.


You are an expert in evidence-based personal development and quantitative assessment. Your task is to score the following intervention using a systematic methodology.

Intervention to Score

Name: [INSERT INTERVENTION NAME]

Primary Domain(s): [INSERT PRIMARY LIFE AREA(S) - e.g., “fitness”, “sleep”, “goals”]

Scoring Framework

Three Components to Score

You must score this intervention across ALL applicable value domains (primary + cross-domain, identified in the Cross-Domain Discovery step below). For each value domain, provide:

  1. Potential Benefit Score (PBS): Maximum theoretical benefit under ideal conditions (0-10 logarithmic scale)
  2. Intervention Success Rate (ISR): Likelihood the intervention succeeds when followed properly (0-100%)
  3. User Adherence Rate (UAR): Likelihood users will follow the intervention properly (0-100%)

Logarithmic Scale (0-10)

Each point represents roughly 2x the impact:

Calculating PBS from a linear anchor:

For linear anchors (e.g. “25% body weight lost”, “15 years additional lifespan”), use this formula:

PBS = 10 + log₂(effect_size / anchor_max)

Examples with a 25% body weight loss anchor:

Do NOT use linear interpolation (e.g. 15/25 × 10 = 6). This is a common error that systematically underscores PBS.

Baseline framing

PBS represents the magnitude of change the intervention produces, not the absolute level a successful user ends up at. Every value in _data/values.yml declares a baseline_percentile field – the percentile of the typical non-adopter, who is the implicit subject of the score.

Default baseline_percentile is 35: most adults who would consider an intervention in a given area are below the median in that area, otherwise they would not be looking for help. A few areas have higher baselines (e.g. housing.affordability baseline = 50, because most people live in average housing).

When writing PBS reasoning, follow this pattern explicitly:

  1. State the baseline. “Baseline (per values.yml): 35th percentile.”
  2. State the predicted endpoint. “After successful adherence, a user could plausibly reach the 60th percentile.”
  3. Compute the delta. “Delta = 25 percentile points of improvement.”
  4. Map the delta to PBS using the log formula with the value’s anchor.

The PBS represents step 4 – the size of the change, not where you end up. A user who is already at the 80th percentile would see less benefit because there is less room to improve; that’s handled by the personalisation layer at display time, not in the PBS itself.

baseline_percentile vs. Tier 0 – do not conflate

These two concepts describe different axes and are easy to muddle when both appear on the same page.

The two correlate for some interventions but decouple frequently. A marathon runner can be at Tier 0 on walking (zero deliberate walks) while sitting at the 90th percentile on fitness.health (because their running makes them fit). A user who has never done meditation can be at Tier 0 on it while still having an 80th-percentile mental-health.stability score (because they have a steady job, supportive partner, and good sleep).

The system treats them as orthogonal:

When you write a PBS reasoning that says “baseline 35th percentile, endpoint 65th, delta 30 points”, you are speaking about the outcome axis – the typical non-adopter at the 35th percentile on the value. You are also implicitly speaking about a Tier 0 starting point on the intervention (they’re not doing it yet), but that’s not the same statement. The PBS captures the delta achievable by progressing from Tier 0 to Tier 3, for someone who starts at the value’s baseline percentile.

Critical Reasoning Requirements

For each score, you must provide reasoning that directly justifies the specific number. Consider:

PBS Reasoning:

Avoiding PBS/ISR overlap on linear anchors:

For objectively measurable values (% body weight lost, years of lifespan, fitness percentile), both PBS and ISR tend to cite the same RCT data. To keep them distinct:

Example (calorie tracking for fat loss):

ISR Reasoning:

UAR Reasoning:

Reasoning Format

All reasoning fields (pbs_reasoning, isr_reasoning, uar_reasoning) must be formatted as a bulleted list of 3-4 points. Use this format in the YAML:

pbs_reasoning: |
  - Point one with [source](url) where available
  - Point two – common-sense reasoning is fine for subjective values
  - Point three with [source](url)

Each point in ISR and UAR reasoning should cite a specific source. PBS reasoning for subjective values may use unsourced common-sense arguments where no direct evidence exists – do not cite studies that measure a different construct just to have a citation.

Description Requirements

Provide a description as a bulleted list (using | YAML multiline syntax with - prefixed lines). Include 3-5 bullets covering:

When citing evidence in descriptions, embed hyperlinks inline without naming the source – e.g. write [Research](url) shows that... not Research from Thaler and Benartzi (2004) shows that.... The reader can click through for attribution; naming authors adds clutter.

Country-Neutral Framing

The site is intended for users globally, mostly in the Global North. Population data and cost defaults use US-aligned figures (most data available, large population) and currency is USD throughout. The user can override individual time and money costs for their own context via the “Personalise these costs” affordance on each intervention page. The framing of all prose fields, however, must be country-neutral so a reader in any country recognises the guidance as applicable to them.

Term substitution table

Apply these in prose fields (what_it_is, how_to_do_it, success_looks_like, common_pitfalls, description, prerequisites, and any *_reasoning field):

Avoid Use instead
GP, GPs (UK/Australia) primary care doctor, primary-care appointment, primary-care consultation
NHS / NHS-eligible (prescriptive) “your healthcare system”, “national health insurance or insurer”, or drop the qualifier
ISA, 401(k), Roth IRA, RRSP, TFSA, pension (UK) tax-advantaged savings, tax-advantaged retirement contributions, retirement plan
current account (UK), checking account (US) main bank account
standing order, direct debit (UK terms) scheduled bank transfer, automated payment
salary sacrifice (UK) pre-tax payroll deduction
£, GBP, mixed-currency figures like “$32 (~£25)” USD only, per site convention
“in the UK”, “in the US”, “Britons”, “Americans” drop the qualifier; use global stats where possible
Zocdoc, NHS app (prescriptive booking link) “your local appointment-booking system” or drop

Citations stay country-specific

Attributed evidence sources (the NHS, NIMH, Vanguard’s How America Saves, the CDC, the FSA, etc.) are fine to cite even when country-specific. The reader sees them as data sources, not prescriptions. Only change the surrounding claim if it’s framed as country-specific. For example:

Specific instruments

Where an intervention depends on a country-specific tax or savings vehicle (e.g. ISA, 401k), use the generic label in the body and trust the reader to map it to their local equivalent. A separate per-intervention country_resources YAML block will eventually handle explicit per-country specifics (links, instrument names, exact prices) – not in scope for new intervention scoring.

Cost reasoning

Cost figures in upfront_cost / ongoing_cost are USD and default to US-aligned figures. Where a cost varies dramatically by country or healthcare system (typical for medical care, dental work, therapy, etc.), say so in the cost reasoning field and direct the reader to the override:

Costs vary widely by country and healthcare system. Where care is covered by national health insurance, out-of-pocket cost may be near zero. Where care is paid privately or via insurance copay, costs may include [specific US-aligned figures]. Users with private-pay healthcare should adjust via ‘Personalise these costs’.

British spelling stays

British spelling is a project convention and unaffected by this section. Keep “optimisation”, “behaviour”, “coeliac”, “optician”, etc. These are writing-convention choices, not country-specific institutional terms.

Value Domain Key Format

CRITICAL: Use the exact format [life_area].[value_name] for all value domain keys. Examples:

Do NOT create new top-level categories like health.longevity or performance.athletic. Always use the life area as the top-level key.

Valid Value Domain Keys

The following domains and values are defined in _data/values.yml. Use ONLY these keys — do not invent new ones. If the intervention requires a value domain not listed here, flag it rather than creating a new key.

All 53 life areas are now scoreable. Full list in _data/values.yml. Use ONLY these keys — do not invent new ones. If the intervention requires a value domain not listed here, flag it rather than creating a new key.

Domain Values
behaviours freedom, emotional, social, resilience
body-image fat_loss, muscle_gain
career-planning clarity, advancement, security
children wellbeing, relationship, achievement, development
cognitive-skills memory, focus, reasoning, lifestyle
communication influence, connection, performance, conflict
community-contribution impact, belonging, fulfilment
consumptive-leisure restoration, enrichment, enjoyment
current-work competence, engagement, rewards
digital-safety security, convenience
emergency-preparedness self-reliance, community, baseline, catastrophic
ethics philosophical, practical, integrity, community
extended-family harmony, closeness, balance
family-of-origin autonomy, connection, healing, duty
financial-planning-tracking accuracy, simplicity, insight
fitness health, performance, enjoyment
food-management competence, craft, waste_reduction
friendship depth, breadth, growth
global-impact impartiality, passion, sustainability, fulfilment
goals clarity, follow_through, adaptability
habits impact, consistency, enjoyment
health-management longterm, vitality, control, simplicity
housework health, order, aesthetics, environmental
housing comfort, affordability, location
information-management retention, retrieval, insight, simplicity
investing growth, safety, simplicity
learning-methods efficiency, depth, enjoyment
legal-matters protection, simplicity, strategic, access
life-purpose clarity, meaning, integration
life-skills high, systematic, teaching
media-diet quality, relevance, breadth, efficiency
mental-health stability, resilience, flourishing
mindfulness clarity, emotional, knowledge, spiritual
networks depth, breadth, relevance
nutrition physical_wellbeing, pleasure_connection, ethical_environmental
organisation tracking, order, speed
participatory-leisure social, achievement, adventure
personality alignment, growth
physical-safety risk, freedom
possessions functionality, simplicity, quality, meaning
rationality accurate-beliefs, decision-making, intellectual-honesty
relationship-status partner-selection, meeting-new-partners, independence, transition-navigation
relationship-quality connection, harmony, alignment
saving security, growth, lifestyle
self-awareness psychological, contemplative, relational, experiential
sex frequency, variety, pleasure, contentment
sleep daily_functioning, long_term_health, comfort_experience
style attractiveness, status-professional, expression, comfort
systems power, simplicity, reliability
time-management productivity, balance, flexibility
transportation efficiency, comfort, safety
value-system practical, insight, authentic, evolution
worldview breadth, depth, utility, meaning

Cross-Domain Discovery

After identifying the primary domain(s), systematically scan the full domain table above to find secondary benefits. For each value, ask: “Would a successful practitioner of this intervention plausibly see a meaningful effect on this value?”

Inclusion guardrails

A cross-domain value should only be scored if ALL of the following are true:

  1. PBS >= 4: The benefit is at least 1.56% of the anchor maximum – not a trace effect
  2. One-sentence causal pathway: You can state the mechanism in one sentence without hand-waving (e.g., “yoga improves sleep quality via stress reduction and parasympathetic activation” passes; “yoga might somehow help career advancement” fails)

There is no cap on the number of secondary domains. If an intervention genuinely meets both tests for many domains, score all of them – broad-spectrum interventions are exactly what cross-domain scoring is designed to surface.

Cross-domain scoring notes

Flagging unscoreable domains

If the intervention plausibly benefits a life area that is NOT in the checklist above (e.g., mental health, time management, friends, communication), add it to the flagged_domains field in the output. This creates a backlog for when those areas get scoring anchors. Format:

flagged_domains:
  - domain: "mental-health"
    rationale: "Reduces anxiety and rumination via parasympathetic activation"
  - domain: "communication"
    rationale: "Improves active listening through increased present-moment awareness"

Resource Requirements Guidelines

Upfront Costs: Include only one-time purchases and setup expenses. Do NOT include ongoing costs like subscriptions or memberships as upfront costs. Use conservative estimates representing the minimum amount needed to begin the intervention effectively.

Ongoing Costs: Include recurring expenses like subscriptions, memberships, consumables, or regular services. Use conservative estimates representing the lowest-cost viable option.

Time Requirements: Estimate the minimum time investment needed for effective implementation, not optimal or average scenarios.

Prerequisites Guidelines

Provide exactly 3-4 bullet points that represent true prerequisites (things that must exist before starting).

Do NOT include:

DO include:

Confidence

Each scoring triple must include a confidence field reflecting the strength of evidence behind the combined PBS/ISR/UAR estimate for that (intervention, value) pair. Use one of:

Confidence is shrinkage applied to the EBS at display time (high → factor 1.0, medium → 0.75, low → 0.5). It captures epistemic uncertainty (how well the numbers are known) and is distinct from ISR, which captures aleatoric uncertainty (variation between people).

When in doubt, default to medium. Reserve high for cases where you genuinely have strong direct evidence; reserve low for cases where you are extrapolating significantly.

Dose-Response

Every intervention must include a dose_response block defining how the benefit scales with the user’s level of engagement. The block has three parts: the metric and tier criteria (specific to the intervention), and a value_templates map naming a shared curve template for each scored (intervention, value) pair.

Tier criteria

Define four tiers (0–3), each with a level, label, and criterion. The criterion describes a concrete user state in the intervention’s natural unit (steps per day, minutes per session, nights per week, etc.). Tier 0 must represent “not doing the intervention”. Tier 3 must represent the dose the PBS values are scored against (set scored_at_tier: 3). Tiers 1 and 2 are intermediate stages where most users live.

Labels should be intervention-specific and human-readable in context (e.g. “Sedentary / Light / Moderate / Active” for walking, “Never / Occasional / Most nights / Every night” for a sleep mask). Don’t try to impose a shared vocabulary across interventions – the labels carry meaning the multipliers can’t.

Default to graded tiers even for nominally on/off interventions. Most have quality and consistency dimensions that justify grading: a sleep mask can be ill-fitting or worn only some nights; a will can be outdated, incomplete, or never professionally reviewed. Reserve binary for genuinely one-off events.

Choosing a template

For each scored value, pick one of the templates in _data/dose_response_templates.yml:

Authoring rationale

Include a rationale field explaining (i) why the tier cuts fall where they do, anchored to the specific evidence base, and (ii) why each value got the template it did. The template choice is a judgment call about mechanism shape; make the reasoning legible so future audits can interrogate it.

Provenance Fields (REQUIRED)

Every intervention YAML must include three provenance fields at the top level so the rendered page can show when, by which model, and against which version of this prompt the scoring was produced.

evaluated_at: [YYYY-MM-DD]     # Today's UTC date when scoring was completed
evaluated_by: [model-id]       # Your own model ID, e.g. claude-opus-4-7
prompt_sha: [short-sha]        # Short commit SHA of this prompt at scoring time

To obtain evaluated_at, run:

date -u +%Y-%m-%d

Use whatever this command returns. Do not copy the date from a prior version of the YAML if re-evaluating an existing intervention – the field records when this evaluation happened, not when the original was written.

To obtain prompt_sha, run:

git log -1 --format=%h methodology/intervention-scoring-prompt.md

Place these three fields at the very top of the YAML, before name:. They are non-optional.

Output Format

Structure your response as YAML:

evaluated_at: [YYYY-MM-DD]
evaluated_by: [model-id]
prompt_sha: [short SHA from git log -1 --format=%h methodology/intervention-scoring-prompt.md]
name: "[Intervention Name]"
description: |
  - What the person does and how it works
  - Key details: frequency, duration, protocols
  - [Evidence](url) for key claims without naming authors
  - What makes this distinct from similar approaches
applicable_domains: ["[primary]", "[secondary1]", "[secondary2]"] # All scored domains (primary + cross-domain)

values:
  [life_area].[value_name]: # Use exact format - e.g., fitness.health, sleep.daily_functioning
    pbs: [0-10]
    isr: [0-100]
    uar: [0-100]
    confidence: [low|medium|high]
    pbs_reasoning: |
      - Baseline (per values.yml): [N]th percentile. Endpoint: [M]th percentile. Delta: [M-N] points.
      - [Specific finding with source link supporting the delta size]
      - [Specific finding with source link]
    isr_reasoning: |
      - [Specific finding with source link]
      - [Specific finding with source link]
      - [Specific finding with source link]
    uar_reasoning: |
      - [Specific finding with source link]
      - [Specific finding with source link]
      - [Specific finding with source link]

dose_response:
  metric: [snake_case_metric_id] # e.g. steps_per_day_average, minutes_per_session, nights_per_week
  metric_label: "[Human-readable label]"
  metric_help: "[Brief help text shown to user when reporting their tier]"
  scored_at_tier: 3 # The PBS values above assume this tier; almost always 3
  tiers:
    - level: 0
      label: "[Intervention-specific label]"
      criterion: "[Concrete criterion in intervention's natural unit]"
    - level: 1
      label: "[Intervention-specific label]"
      criterion: "[Concrete criterion]"
    - level: 2
      label: "[Intervention-specific label]"
      criterion: "[Concrete criterion]"
    - level: 3
      label: "[Intervention-specific label]"
      criterion: "[Concrete criterion]"
  value_templates:
    [life_area].[value_name]: [steep_diminishing|mild_diminishing|near_linear|threshold|binary]
    # ...one entry per scored value
  rationale: |
    - [Why the tier cuts fall where they do, with citations to the evidence base]
    - [Why each value got the template it did, grouped by template choice]

resources:
  upfront_cost: [USD]
  upfront_cost_reasoning: "[Minimum one-time expenses - basic equipment, essential setup only]"
  ongoing_cost: [USD]
  ongoing_cost_period: "[week/month/year]"
  ongoing_cost_reasoning: "[Minimum recurring expenses - lowest-cost viable options]"
  upfront_time: [hours]
  upfront_time_reasoning: "[Minimum time for basic setup and essential learning]"
  ongoing_time: [hours]
  ongoing_time_period: "[week/month]"
  ongoing_time_reasoning: "[Minimum effective time commitment per period]"

prerequisites:
  - "[Physical/health requirement]"
  - "[Access/equipment requirement]"
  - "[Knowledge/skill requirement]"

flagged_domains: # Life areas that plausibly benefit but lack scoring anchors
  - domain: "[life-area-slug]"
    rationale: "[One-sentence causal pathway]"

Key Reminders