Add design spec for tag enhancement
Covers taxonomy seeding with a personal-narrative cluster, prompt rewrite to stop force-fitting taxonomy tags, and raised combined tag cap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,97 @@
|
||||
# Tag Enhancement Design
|
||||
|
||||
Date: 2026-04-19
|
||||
|
||||
## Problem
|
||||
|
||||
The tagger is producing wrong tags for personal-narrative content. Concrete example: the essay "Becoming a Morning Person" — a memoir about life stages, family, recovery, aging, and parenting — was tagged `productivity` and `learning`.
|
||||
|
||||
Root cause is twofold:
|
||||
|
||||
1. **Taxonomy gap.** `tag-taxonomy.yaml` has no categories that cover personal narrative, memoir, family, or reflection. The closest fits the LLM can find are productivity-adjacent tags from the "Knowledge & Learning" cluster.
|
||||
2. **Prompt bias.** The system prompt in `request_metadata` (tag-notes.py:127) requires 1-5 taxonomy tags (no zero option) and tells the LLM to be "conservative" about new tag suggestions (0-2 max). Together these force the model to pick taxonomy tags even when none genuinely apply, and discourage it from proposing the new categories that would better describe the content.
|
||||
|
||||
## Goals
|
||||
|
||||
- The "Becoming a Morning Person" essay should be tagged with memoir/personal-narrative concepts, not productivity/learning.
|
||||
- Future notes that fall outside the current taxonomy should surface new tag suggestions rather than force-fit existing ones.
|
||||
- Taxonomy can grow over time via the existing `new_tag_accumulator` → end-of-run prompt flow — no change to that mechanism.
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- No two-pass LLM classification. Single call per note stays.
|
||||
- No few-shot examples in the prompt for this iteration. May be added as a follow-up if the 20B local model underperforms on the rewritten prompt.
|
||||
- No change to `seo_title_suffix`, `seo_description`, `seo_keywords`, `CONTENT_CHAR_LIMIT`, the SEO-description retry flow, YAML round-tripping, or the slug derivation.
|
||||
|
||||
## Changes
|
||||
|
||||
### 1. Taxonomy additions (`tag-taxonomy.yaml`)
|
||||
|
||||
Add a new cluster at the end of the file:
|
||||
|
||||
```yaml
|
||||
# Personal Narrative & Life
|
||||
- memoir
|
||||
- personal-essay
|
||||
- reflection
|
||||
- family
|
||||
- parenting
|
||||
- recovery
|
||||
- mental-health
|
||||
- aging
|
||||
- relationships
|
||||
- childhood
|
||||
- identity
|
||||
```
|
||||
|
||||
Rationale: these are deliberately broad and reusable. `sobriety` was considered and rejected — not expected to be a recurring theme. `recovery` is retained as a broader concept (recovery from any kind of setback, not only substance-related).
|
||||
|
||||
### 2. Prompt rewrite in `request_metadata` (tag-notes.py:127)
|
||||
|
||||
Three changes to the system prompt:
|
||||
|
||||
**a. Add a tagging philosophy sentence** at the top of the `Rules:` section:
|
||||
|
||||
> Tags should describe what the note is substantively about, not topics it merely mentions in passing.
|
||||
|
||||
**b. Allow zero taxonomy tags.** Replace:
|
||||
|
||||
> `tags_from_taxonomy: 1-5 tags drawn from the existing taxonomy that best fit the content.`
|
||||
|
||||
with:
|
||||
|
||||
> `tags_from_taxonomy: 0-5 tags drawn from the existing taxonomy, ONLY when they genuinely fit. Do NOT force a taxonomy tag — return an empty list if nothing truly applies.`
|
||||
|
||||
**c. Loosen new-tag suggestions.** Replace:
|
||||
|
||||
> `new_tag_suggestions: 0-2 NEW tags, only when content truly warrants it (be conservative).`
|
||||
|
||||
with:
|
||||
|
||||
> `new_tag_suggestions: 0-5 NEW tags when the taxonomy doesn't adequately cover the content. Each must be a reusable category (not hyper-specific to one note). Use lowercase-hyphenated style (e.g., personal-essay).`
|
||||
|
||||
### 3. Raise the combined tag cap (tag-notes.py:225)
|
||||
|
||||
Change `[:5]` to `[:8]`:
|
||||
|
||||
```python
|
||||
combined = list(dict.fromkeys(list(taxonomy_tags) + list(new_suggestions)))[:8]
|
||||
```
|
||||
|
||||
Memoir and reflection-style notes often legitimately touch 6-8 distinct themes; capping at 5 was causing otherwise-accurate tags to be dropped.
|
||||
|
||||
## Verification
|
||||
|
||||
After implementation:
|
||||
|
||||
1. Open the "Becoming a Morning Person" note in `~/Documents/ejl-zk/40 Public/41 Notes/` and clear its `tags:` frontmatter field (set to empty list).
|
||||
2. Run `./tag-notes.py`.
|
||||
3. Confirm the new `tags` value includes memoir/personal-essay-style tags (e.g., `memoir`, `personal-essay`, `family`, `recovery`, `aging`, `reflection`) and does NOT include `productivity` or `learning`.
|
||||
4. Spot-check 1-2 other notes that already have reasonable tags — confirm the rewrite didn't regress them. (All LLM-backed fields are only touched when empty, so notes with existing tags won't be reprocessed at all.)
|
||||
|
||||
If the tags still look off, the follow-up is to add a single few-shot example to the system prompt showing a memoir case with zero taxonomy tags and all new suggestions — not included in this change.
|
||||
|
||||
## Files Touched
|
||||
|
||||
- `tag-taxonomy.yaml` — append new cluster
|
||||
- `tag-notes.py` — `request_metadata` system prompt (~20 lines) and the `[:5]` → `[:8]` cap
|
||||
Reference in New Issue
Block a user