diff --git a/docs/superpowers/specs/2026-04-19-tag-enhancement-design.md b/docs/superpowers/specs/2026-04-19-tag-enhancement-design.md new file mode 100644 index 0000000..160c7d7 --- /dev/null +++ b/docs/superpowers/specs/2026-04-19-tag-enhancement-design.md @@ -0,0 +1,97 @@ +# Tag Enhancement Design + +Date: 2026-04-19 + +## Problem + +The tagger is producing wrong tags for personal-narrative content. Concrete example: the essay "Becoming a Morning Person" — a memoir about life stages, family, recovery, aging, and parenting — was tagged `productivity` and `learning`. + +Root cause is twofold: + +1. **Taxonomy gap.** `tag-taxonomy.yaml` has no categories that cover personal narrative, memoir, family, or reflection. The closest fits the LLM can find are productivity-adjacent tags from the "Knowledge & Learning" cluster. +2. **Prompt bias.** The system prompt in `request_metadata` (tag-notes.py:127) requires 1-5 taxonomy tags (no zero option) and tells the LLM to be "conservative" about new tag suggestions (0-2 max). Together these force the model to pick taxonomy tags even when none genuinely apply, and discourage it from proposing the new categories that would better describe the content. + +## Goals + +- The "Becoming a Morning Person" essay should be tagged with memoir/personal-narrative concepts, not productivity/learning. +- Future notes that fall outside the current taxonomy should surface new tag suggestions rather than force-fit existing ones. +- Taxonomy can grow over time via the existing `new_tag_accumulator` → end-of-run prompt flow — no change to that mechanism. + +## Non-Goals + +- No two-pass LLM classification. Single call per note stays. +- No few-shot examples in the prompt for this iteration. May be added as a follow-up if the 20B local model underperforms on the rewritten prompt. +- No change to `seo_title_suffix`, `seo_description`, `seo_keywords`, `CONTENT_CHAR_LIMIT`, the SEO-description retry flow, YAML round-tripping, or the slug derivation. + +## Changes + +### 1. Taxonomy additions (`tag-taxonomy.yaml`) + +Add a new cluster at the end of the file: + +```yaml + # Personal Narrative & Life + - memoir + - personal-essay + - reflection + - family + - parenting + - recovery + - mental-health + - aging + - relationships + - childhood + - identity +``` + +Rationale: these are deliberately broad and reusable. `sobriety` was considered and rejected — not expected to be a recurring theme. `recovery` is retained as a broader concept (recovery from any kind of setback, not only substance-related). + +### 2. Prompt rewrite in `request_metadata` (tag-notes.py:127) + +Three changes to the system prompt: + +**a. Add a tagging philosophy sentence** at the top of the `Rules:` section: + +> Tags should describe what the note is substantively about, not topics it merely mentions in passing. + +**b. Allow zero taxonomy tags.** Replace: + +> `tags_from_taxonomy: 1-5 tags drawn from the existing taxonomy that best fit the content.` + +with: + +> `tags_from_taxonomy: 0-5 tags drawn from the existing taxonomy, ONLY when they genuinely fit. Do NOT force a taxonomy tag — return an empty list if nothing truly applies.` + +**c. Loosen new-tag suggestions.** Replace: + +> `new_tag_suggestions: 0-2 NEW tags, only when content truly warrants it (be conservative).` + +with: + +> `new_tag_suggestions: 0-5 NEW tags when the taxonomy doesn't adequately cover the content. Each must be a reusable category (not hyper-specific to one note). Use lowercase-hyphenated style (e.g., personal-essay).` + +### 3. Raise the combined tag cap (tag-notes.py:225) + +Change `[:5]` to `[:8]`: + +```python +combined = list(dict.fromkeys(list(taxonomy_tags) + list(new_suggestions)))[:8] +``` + +Memoir and reflection-style notes often legitimately touch 6-8 distinct themes; capping at 5 was causing otherwise-accurate tags to be dropped. + +## Verification + +After implementation: + +1. Open the "Becoming a Morning Person" note in `~/Documents/ejl-zk/40 Public/41 Notes/` and clear its `tags:` frontmatter field (set to empty list). +2. Run `./tag-notes.py`. +3. Confirm the new `tags` value includes memoir/personal-essay-style tags (e.g., `memoir`, `personal-essay`, `family`, `recovery`, `aging`, `reflection`) and does NOT include `productivity` or `learning`. +4. Spot-check 1-2 other notes that already have reasonable tags — confirm the rewrite didn't regress them. (All LLM-backed fields are only touched when empty, so notes with existing tags won't be reprocessed at all.) + +If the tags still look off, the follow-up is to add a single few-shot example to the system prompt showing a memoir case with zero taxonomy tags and all new suggestions — not included in this change. + +## Files Touched + +- `tag-taxonomy.yaml` — append new cluster +- `tag-notes.py` — `request_metadata` system prompt (~20 lines) and the `[:5]` → `[:8]` cap