Recommendation confidence is a 0–1 score on each ranked suggestion that answers a narrower question than the rank itself: how much do we trust that this rank will still be the right one tomorrow given what we currently know about the user. Rank is "this is where the ingredient sits today." Confidence is "this is how shakily it sits there." In Unfair, confidence shows up as both a numeric score and a tier label on every recommendation card, and it decides how aggressively the engine optimizes versus how patiently it waits for more data.
What goes into a confidence score
Confidence is a multiplicative combination of four factors, each normalized to 0–1.
- Input density. Days of logging inside the relevant window; a 3-day log produces a lower number than a 21-day log even if the trend agrees.
- Adherence stability. Variance in day-to-day adherence; wide swings drop the score.
- Signal agreement. Whether subjective proxies and objective proxies point the same direction on the tracked outcome.
- Evidence-tier match. How strong the external evidence tier is for the specific goal the user chose.
No single factor dominates, and no factor can push confidence above the floor set by the weakest one. A strong trial does not rescue a sparse log, which is the correct behavior — six days of 95% adherence is still only six days.
Confidence-tier bands
Unfair surfaces the score as a three-band label so rank cards stay readable.
| Band | Score | What it signals | What the engine does |
|---|---|---|---|
| High | ≥ 0.70 | Consistent logs, agreeing proxies, evidence tier A or B for the goal | Proposes active dose changes; shortens next review window to 2 weeks |
| Medium | 0.40–0.69 | Partial agreement, mid-density logs, or evidence tier C | Recommends hold; allows form or timing changes only |
| Low | < 0.40 | Sparse logs, conflicting proxies, or a recent medication change | Defers optimization; requests an extra 7 days of logging before the next rerank |
A high-confidence rank is not a high-confidence prediction of benefit — it is a high-confidence prediction that the current rank is the best the current data will produce. Those are different claims, and the distinction matters whenever a user reads a confidence score as an endorsement of the ingredient.
Two identical ranks, different confidence
Two users both rank magnesium glycinate at position 1 with an identical ranking score of 0.81 on the 0–1 scale. User A has 21 days of logs, 84% adherence, and subjective sleep score plus sleep-efficiency trend both rising — confidence lands at 0.72. User B has 8 days of logs, 88% adherence, a rising subjective score against a flat objective trend, and a medication change three days ago — confidence lands at 0.41. Same rank, different suggested actions. User A is told to continue the dose through a 2-week review. User B is told to hold, log another 14 days, and surface the medication change in the rationale before any dose change. The rank agrees; the trust level does not.
When confidence drops mid-cycle
Confidence is not sticky. A week of missed logs, a new medication, or a sudden disagreement between subjective and objective proxies drops the score and automatically moves the engine into a more conservative mode: favoring core over optional, no new adds, no dose increases. This shows up on the user side as softer, wait-oriented language in the rationale and a longer next-review horizon.
Why confidence and rank are reported separately
Collapsing confidence into rank would hide the reason a rank is fragile. A user seeing only "rank 1, score 0.81" cannot tell whether they should start tonight or log another two weeks before starting. Splitting the two numbers preserves that distinction — the rank answers "which," confidence answers "how sure," and the two together answer "what should I do today." Most other consumer apps collapse these into a single star rating, which looks simpler but throws away the evidence the user needs to act safely.
Confidence floor behavior
The engine keeps a per-goal confidence floor and will not propose an active change below it. For sleep-quality goals the floor is typically 0.45; for stimulant-adjacent goals (focus, morning energy) it is 0.55 because the downside of a mis-ranked stimulant is higher than the downside of a mis-ranked magnesium. A candidate that would otherwise rank at position 1 but sits below its goal's floor is held in a "candidate" state, shown on the recommendations page but not pushed as a starter — the user has to tap in to start it rather than seeing a default "begin now" button.
How confidence recovers
Recovery from a low-confidence state is mechanical. Seven consecutive days of logging at above 75% adherence with no new medications restores input density and adherence stability contributions. A week where subjective and objective proxies agree on direction restores signal agreement. Evidence-tier contribution does not change unless the user changes goals, because it is a property of the library, not the log. Most low-confidence states clear inside two weeks of consistent tracking, which is the shortest honest path to a stable rank the user can act on.
How this appears in Unfair
Each ranked card shows a confidence badge (high, medium, or low) and the score on tap. The rationale snippet names the one or two factors pulling confidence down, not just the final number. Low-confidence cards carry a "hold and log" action instead of a "start now" button, which is the difference between a recommendation ranking that respects its own uncertainty and one that does not.
Clinical safety note
Confidence is an engineering number, not a medical one. A high-confidence recommendation cannot be read as reassurance that a compound is safe for the user's specific medication list or diagnosis, and a low-confidence one cannot be read as a warning. Symptoms that are new, severe, or persistent always take precedence over any confidence label and should prompt a pause and a clinician conversation.