Compare commits

...

7 Commits

Author SHA1 Message Date
ejlewis e792732a13 switch to using Anthropic API 2026-04-25 18:47:29 -05:00
ejlewis fd1407e06f Raise combined tag cap from 5 to 8 2026-04-19 22:27:40 -05:00
ejlewis 68d78fe6bd Allow zero-taxonomy-tag output and more new tag suggestions 2026-04-19 22:24:53 -05:00
ejlewis b4fb1283b9 Add personal-narrative cluster to tag taxonomy 2026-04-19 22:22:27 -05:00
ejlewis ef41b6b30a enhancements 2026-04-19 22:18:10 -05:00
ejlewis 3eff77aa1a Add implementation plan for tag enhancement
Four tasks: taxonomy append, prompt rewrite, tag cap bump, and
end-to-end verification against the Morning Person note.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 22:15:34 -05:00
ejlewis 43d708c834 Add design spec for tag enhancement
Covers taxonomy seeding with a personal-narrative cluster, prompt
rewrite to stop force-fitting taxonomy tags, and raised combined
tag cap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 22:13:07 -05:00
5 changed files with 499 additions and 145 deletions
BIN
View File
Binary file not shown.
@@ -0,0 +1,230 @@
# Tag Enhancement Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Stop the tagger from force-fitting taxonomy tags onto notes that don't match (e.g., tagging a memoir as "productivity"), by seeding a personal-narrative cluster into the taxonomy and rewriting the prompt to permit zero-taxonomy-tag output when nothing fits.
**Architecture:** Single-script tool; all changes land in `tag-notes.py` (system prompt + tag cap) and `tag-taxonomy.yaml` (new cluster). No new files, no new dependencies, no test harness added. Verification is a manual re-run against a known problem note.
**Tech Stack:** Python, PyYAML, ruamel.yaml, local LM Studio (OpenAI-compatible) endpoint.
**Note on testing:** This project has no test infrastructure (per `CLAUDE.md`: "There are no tests, linter, or build step"). The tagger's correctness is judged by LLM output quality against real notes, not unit tests. Each task below ends in a manual spot-check where meaningful; end-to-end verification lives in Task 4.
**Spec:** `docs/superpowers/specs/2026-04-19-tag-enhancement-design.md`
---
## Task 1: Add Personal Narrative cluster to taxonomy
**Files:**
- Modify: `tag-taxonomy.yaml` (append at end of file)
- [ ] **Step 1: Append the new cluster**
Append these lines to the end of `tag-taxonomy.yaml` (there is currently a `# Personal Interests` cluster ending with `- gardening`; add a blank line after `gardening`, then this block):
```yaml
# Personal Narrative & Life
- memoir
- personal-essay
- reflection
- family
- parenting
- recovery
- mental-health
- aging
- relationships
- childhood
- identity
```
- [ ] **Step 2: Verify the YAML still parses**
Run: `python3 -c "import yaml; print(len(yaml.safe_load(open('tag-taxonomy.yaml'))['tags']))"`
Expected: prints an integer equal to the old count + 11 (the old file had 31 tags, so expect `42`). If the number isn't old-count + 11, the YAML is malformed — fix indentation before moving on.
- [ ] **Step 3: Commit**
```bash
git add tag-taxonomy.yaml
git commit -m "Add personal-narrative cluster to tag taxonomy"
```
---
## Task 2: Rewrite the system prompt in `request_metadata`
**Files:**
- Modify: `tag-notes.py:127-145` (the `system_prompt` f-string inside `request_metadata`)
**Context:** Current prompt forces 1-5 taxonomy tags and tells the LLM to be "conservative" about new suggestions. We're flipping both: allow 0 taxonomy tags when nothing fits, and let new suggestions go up to 5.
- [ ] **Step 1: Replace the system_prompt f-string**
In `tag-notes.py`, find the current `system_prompt` assignment inside `request_metadata` (starts at line 127):
```python
system_prompt = f"""You analyze markdown notes and return structured metadata.
Return ONLY valid JSON in this exact shape:
{{
"tags_from_taxonomy": ["tag1", "tag2"],
"new_tag_suggestions": ["newtag1"],
"seo_title_suffix": "Short descriptor that will follow the note title",
"seo_description": "Factual summary between {SEO_DESC_MIN} and {SEO_DESC_MAX} characters.",
"seo_keywords": ["keyword1", "keyword2"]
}}
Rules:
- tags_from_taxonomy: 1-5 tags drawn from the existing taxonomy that best fit the content.
- new_tag_suggestions: 0-2 NEW tags, only when content truly warrants it (be conservative).
- seo_title_suffix: a short, clean, non-clickbaity descriptor of the note. Do NOT include the note title or a leading colon — only the text that would follow "<title>: ". Aim for 4-10 words.
- seo_description: a clean factual summary, STRICTLY between {SEO_DESC_MIN} and {SEO_DESC_MAX} characters inclusive. Count characters carefully before responding.
- seo_keywords: 10-15 relevant keywords, no duplicates.
Existing tag taxonomy: {taxonomy_str}"""
```
Replace it with:
```python
system_prompt = f"""You analyze markdown notes and return structured metadata.
Return ONLY valid JSON in this exact shape:
{{
"tags_from_taxonomy": ["tag1", "tag2"],
"new_tag_suggestions": ["newtag1"],
"seo_title_suffix": "Short descriptor that will follow the note title",
"seo_description": "Factual summary between {SEO_DESC_MIN} and {SEO_DESC_MAX} characters.",
"seo_keywords": ["keyword1", "keyword2"]
}}
Rules:
- Tags should describe what the note is substantively about, not topics it merely mentions in passing.
- tags_from_taxonomy: 0-5 tags drawn from the existing taxonomy, ONLY when they genuinely fit. Do NOT force a taxonomy tag — return an empty list if nothing truly applies.
- new_tag_suggestions: 0-5 NEW tags when the taxonomy doesn't adequately cover the content. Each must be a reusable category (not hyper-specific to one note). Use lowercase-hyphenated style (e.g., personal-essay).
- seo_title_suffix: a short, clean, non-clickbaity descriptor of the note. Do NOT include the note title or a leading colon — only the text that would follow "<title>: ". Aim for 4-10 words.
- seo_description: a clean factual summary, STRICTLY between {SEO_DESC_MIN} and {SEO_DESC_MAX} characters inclusive. Count characters carefully before responding.
- seo_keywords: 10-15 relevant keywords, no duplicates.
Existing tag taxonomy: {taxonomy_str}"""
```
The three substantive changes:
1. Added philosophy line: `- Tags should describe what the note is substantively about, not topics it merely mentions in passing.`
2. `tags_from_taxonomy: 1-5 ... best fit the content.``0-5 ... ONLY when they genuinely fit. Do NOT force a taxonomy tag — return an empty list if nothing truly applies.`
3. `new_tag_suggestions: 0-2 ... be conservative).``0-5 NEW tags when the taxonomy doesn't adequately cover the content. Each must be a reusable category (not hyper-specific to one note). Use lowercase-hyphenated style (e.g., personal-essay).`
- [ ] **Step 2: Syntax check the module**
Run: `python3 -c "import ast; ast.parse(open('tag-notes.py').read()); print('ok')"`
Expected: prints `ok`. If it prints a SyntaxError, the f-string braces or quotes are wrong — fix before moving on.
- [ ] **Step 3: Commit**
```bash
git add tag-notes.py
git commit -m "Allow zero-taxonomy-tag output and more new tag suggestions"
```
---
## Task 3: Raise the combined tag cap from 5 to 8
**Files:**
- Modify: `tag-notes.py:225` (inside `process_note`, in the `if needs_tags:` block)
- [ ] **Step 1: Change the slice**
Find this line in `tag-notes.py` (inside `process_note`, ~line 225):
```python
combined = list(dict.fromkeys(list(taxonomy_tags) + list(new_suggestions)))[:5]
```
Change to:
```python
combined = list(dict.fromkeys(list(taxonomy_tags) + list(new_suggestions)))[:8]
```
- [ ] **Step 2: Syntax check**
Run: `python3 -c "import ast; ast.parse(open('tag-notes.py').read()); print('ok')"`
Expected: prints `ok`.
- [ ] **Step 3: Commit**
```bash
git add tag-notes.py
git commit -m "Raise combined tag cap from 5 to 8"
```
---
## Task 4: End-to-end verification against the problem note
**Files:**
- Modify (temporarily): the "Becoming a Morning Person" note in `~/Documents/ejl-zk/40 Public/41 Notes/`
**Context:** The tagger only touches fields that are currently empty. To force reprocessing we clear the `tags:` field on the problem note, run the script, and inspect the result.
- [ ] **Step 1: Locate the note**
Run: `find ~/Documents/ejl-zk/40\ Public/41\ Notes/ -iname "*morning*person*.md"`
Expected: prints one file path. Note it for the following steps (referred to below as `$NOTE`).
- [ ] **Step 2: Confirm LM Studio is up**
Run: `curl -s http://localhost:1234/v1/models | head -c 200`
Expected: JSON response listing at least one model, including `openai/gpt-oss-20b` (the value of `MODEL_NAME`). If the request fails, start LM Studio and load the model before continuing.
- [ ] **Step 3: Clear the existing tags field**
Open `$NOTE` in an editor and set the `tags:` frontmatter value to an empty list (`tags: []`) or delete the line entirely. Save. Do NOT clear other fields — the script won't touch already-populated ones, which is the desired isolation.
- [ ] **Step 4: Run the tagger**
Run: `cd ~/bin/note-tagger && ./tag-notes.py`
Expected: the script processes every note; for the Morning Person note it prints a line like ` + Added tags: memoir, personal-essay, family, recovery, aging, ...` and does NOT print `productivity` or `learning` in that line.
- [ ] **Step 5: Inspect the frontmatter**
Open `$NOTE` and inspect the `tags:` block.
Pass criteria:
- Contains at least 3 of: `memoir`, `personal-essay`, `reflection`, `family`, `parenting`, `recovery`, `aging`, `relationships`, `childhood`, `identity`.
- Does NOT contain `productivity` or `learning`.
- Has ≤ 8 tags total (enforces the Task 3 cap).
Fail criteria (any triggers a rethink — do NOT patch around it):
- Still includes `productivity` or `learning`.
- Zero tags written.
- More than 8 tags.
If it fails: the follow-up noted in the spec is to add a single few-shot example to the prompt. Stop and report the failing output; don't silently escalate to that change.
- [ ] **Step 6: Spot-check one other already-tagged note**
Run: `git status` in the notes vault (if it's a git repo) OR simply open one other note that already had tags before this run. Confirm its `tags:` field was NOT modified (the script's "only touch empty fields" invariant must still hold).
Expected: no change to that note's tags. If there IS a change, the empty-check logic regressed — stop and investigate.
- [ ] **Step 7: No commit for this task**
Task 4 is verification only. Any prompt-tuning follow-up is out of scope for this plan.
---
## Out of scope (from spec)
- Two-pass LLM classification.
- Few-shot examples in the prompt (follow-up candidate only if Task 4 fails).
- Changes to `seo_*` fields, `CONTENT_CHAR_LIMIT`, retry flow, slug derivation, or YAML round-tripping.
@@ -0,0 +1,97 @@
# Tag Enhancement Design
Date: 2026-04-19
## Problem
The tagger is producing wrong tags for personal-narrative content. Concrete example: the essay "Becoming a Morning Person" — a memoir about life stages, family, recovery, aging, and parenting — was tagged `productivity` and `learning`.
Root cause is twofold:
1. **Taxonomy gap.** `tag-taxonomy.yaml` has no categories that cover personal narrative, memoir, family, or reflection. The closest fits the LLM can find are productivity-adjacent tags from the "Knowledge & Learning" cluster.
2. **Prompt bias.** The system prompt in `request_metadata` (tag-notes.py:127) requires 1-5 taxonomy tags (no zero option) and tells the LLM to be "conservative" about new tag suggestions (0-2 max). Together these force the model to pick taxonomy tags even when none genuinely apply, and discourage it from proposing the new categories that would better describe the content.
## Goals
- The "Becoming a Morning Person" essay should be tagged with memoir/personal-narrative concepts, not productivity/learning.
- Future notes that fall outside the current taxonomy should surface new tag suggestions rather than force-fit existing ones.
- Taxonomy can grow over time via the existing `new_tag_accumulator` → end-of-run prompt flow — no change to that mechanism.
## Non-Goals
- No two-pass LLM classification. Single call per note stays.
- No few-shot examples in the prompt for this iteration. May be added as a follow-up if the 20B local model underperforms on the rewritten prompt.
- No change to `seo_title_suffix`, `seo_description`, `seo_keywords`, `CONTENT_CHAR_LIMIT`, the SEO-description retry flow, YAML round-tripping, or the slug derivation.
## Changes
### 1. Taxonomy additions (`tag-taxonomy.yaml`)
Add a new cluster at the end of the file:
```yaml
# Personal Narrative & Life
- memoir
- personal-essay
- reflection
- family
- parenting
- recovery
- mental-health
- aging
- relationships
- childhood
- identity
```
Rationale: these are deliberately broad and reusable. `sobriety` was considered and rejected — not expected to be a recurring theme. `recovery` is retained as a broader concept (recovery from any kind of setback, not only substance-related).
### 2. Prompt rewrite in `request_metadata` (tag-notes.py:127)
Three changes to the system prompt:
**a. Add a tagging philosophy sentence** at the top of the `Rules:` section:
> Tags should describe what the note is substantively about, not topics it merely mentions in passing.
**b. Allow zero taxonomy tags.** Replace:
> `tags_from_taxonomy: 1-5 tags drawn from the existing taxonomy that best fit the content.`
with:
> `tags_from_taxonomy: 0-5 tags drawn from the existing taxonomy, ONLY when they genuinely fit. Do NOT force a taxonomy tag — return an empty list if nothing truly applies.`
**c. Loosen new-tag suggestions.** Replace:
> `new_tag_suggestions: 0-2 NEW tags, only when content truly warrants it (be conservative).`
with:
> `new_tag_suggestions: 0-5 NEW tags when the taxonomy doesn't adequately cover the content. Each must be a reusable category (not hyper-specific to one note). Use lowercase-hyphenated style (e.g., personal-essay).`
### 3. Raise the combined tag cap (tag-notes.py:225)
Change `[:5]` to `[:8]`:
```python
combined = list(dict.fromkeys(list(taxonomy_tags) + list(new_suggestions)))[:8]
```
Memoir and reflection-style notes often legitimately touch 6-8 distinct themes; capping at 5 was causing otherwise-accurate tags to be dropped.
## Verification
After implementation:
1. Open the "Becoming a Morning Person" note in `~/Documents/ejl-zk/40 Public/41 Notes/` and clear its `tags:` frontmatter field (set to empty list).
2. Run `./tag-notes.py`.
3. Confirm the new `tags` value includes memoir/personal-essay-style tags (e.g., `memoir`, `personal-essay`, `family`, `recovery`, `aging`, `reflection`) and does NOT include `productivity` or `learning`.
4. Spot-check 1-2 other notes that already have reasonable tags — confirm the rewrite didn't regress them. (All LLM-backed fields are only touched when empty, so notes with existing tags won't be reprocessed at all.)
If the tags still look off, the follow-up is to add a single few-shot example to the system prompt showing a memoir case with zero taxonomy tags and all new suggestions — not included in this change.
## Files Touched
- `tag-taxonomy.yaml` — append new cluster
- `tag-notes.py``request_metadata` system prompt (~20 lines) and the `[:5]``[:8]` cap
+122 -100
View File
@@ -1,53 +1,65 @@
#!/usr/bin/env python3
"""
Note Tagging and SEO Metadata Script
Processes markdown notes using a local LLM to add tags, slugs, and SEO metadata
Processes markdown notes using the Anthropic API to add tags, slugs, and SEO metadata.
"""
import os
import sys
import io
import json
import os
import re
import sys
from pathlib import Path
import requests
import anthropic
import yaml
from ruamel.yaml import YAML
# ---------------------------------------------------------------------------
# Configuration
LM_STUDIO_URL = "http://localhost:1234/v1/chat/completions"
MODEL_NAME = "openai/gpt-oss-20b"
# ---------------------------------------------------------------------------
TAXONOMY_FILE = "tag-taxonomy.yaml"
NOTES_FOLDER = os.path.expanduser("~/Documents/ejl-zk/40 Public/41 Notes/")
CONTENT_CHAR_LIMIT = 20000
SEO_DESC_MIN = 150
SEO_DESC_MAX = 160
MODEL = "claude-sonnet-4-6"
# Round-trip YAML preserves existing frontmatter formatting
yaml_rt = YAML()
yaml_rt.preserve_quotes = True
yaml_rt.width = 4096
# Anthropic client — reads ANTHROPIC_API_KEY from env automatically
client = anthropic.Anthropic()
def load_taxonomy(taxonomy_path):
with open(taxonomy_path, 'r') as f:
# ---------------------------------------------------------------------------
# Taxonomy helpers
# ---------------------------------------------------------------------------
def load_taxonomy(taxonomy_path: Path) -> list[str]:
with open(taxonomy_path) as f:
data = yaml.safe_load(f) or {}
return data.get('tags', []) or []
return data.get("tags", []) or []
def append_tags_to_taxonomy(taxonomy_path, new_tags):
with open(taxonomy_path, 'r') as f:
def append_tags_to_taxonomy(taxonomy_path: Path, new_tags: set[str]) -> None:
with open(taxonomy_path) as f:
data = yaml.safe_load(f) or {}
existing = data.get('tags', []) or []
existing = data.get("tags", []) or []
combined = list(dict.fromkeys(existing + list(new_tags)))
data['tags'] = combined
with open(taxonomy_path, 'w') as f:
data["tags"] = combined
with open(taxonomy_path, "w") as f:
yaml.dump(data, f, default_flow_style=False, sort_keys=False, allow_unicode=True)
def extract_frontmatter(content):
pattern = r'^---\s*\n(.*?)\n---\s*\n(.*)$'
# ---------------------------------------------------------------------------
# Frontmatter helpers
# ---------------------------------------------------------------------------
def extract_frontmatter(content: str):
pattern = r"^---\s*\n(.*?)\n---\s*\n(.*)$"
match = re.match(pattern, content, re.DOTALL)
if not match:
return None, content
@@ -55,66 +67,65 @@ def extract_frontmatter(content):
return frontmatter, match.group(2)
def reconstruct_markdown(frontmatter, body):
def reconstruct_markdown(frontmatter, body: str) -> str:
stream = io.StringIO()
yaml_rt.dump(frontmatter, stream)
fm_str = stream.getvalue()
if not fm_str.endswith('\n'):
fm_str += '\n'
if not fm_str.endswith("\n"):
fm_str += "\n"
return f"---\n{fm_str}---\n{body}"
def slugify(text):
def slugify(text: str) -> str:
text = text.lower()
text = re.sub(r"['`]", '', text)
text = re.sub(r'[^\w\s-]', ' ', text)
text = re.sub(r'[-\s]+', '-', text).strip('-')
text = re.sub(r"[''`]", "", text)
text = re.sub(r"[^\w\s-]", " ", text)
text = re.sub(r"[-\s]+", "-", text).strip("-")
return text
def parse_json_response(content):
# ---------------------------------------------------------------------------
# LLM helpers
# ---------------------------------------------------------------------------
def parse_json_response(content: str | None) -> dict | None:
if content is None:
return None
try:
return json.loads(content)
except json.JSONDecodeError:
pass
start = content.find('{')
end = content.rfind('}')
start = content.find("{")
end = content.rfind("}")
if start != -1 and end > start:
try:
return json.loads(content[start:end + 1])
return json.loads(content[start : end + 1])
except json.JSONDecodeError:
pass
return None
def call_llm_json(system_prompt, user_prompt, max_tokens=900):
payload = {
"model": MODEL_NAME,
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
"temperature": 0.2,
"max_tokens": max_tokens,
"response_format": {"type": "text"},
}
def call_llm_json(system_prompt: str, user_prompt: str, max_tokens: int = 1024) -> dict | None:
try:
response = requests.post(LM_STUDIO_URL, json=payload, timeout=120)
if not response.ok:
print(f" ! LLM error: {response.status_code} {response.reason}")
print(f" body: {response.text[:500]}")
return None
result = response.json()
content = result['choices'][0]['message']['content']
return parse_json_response(content)
except Exception as e:
print(f" ! LLM error: {e}")
message = client.messages.create(
model=MODEL,
max_tokens=max_tokens,
system=system_prompt,
messages=[{"role": "user", "content": user_prompt}],
temperature=0.2,
)
content = message.content[0].text if message.content else ""
parsed = parse_json_response(content)
if parsed is None:
print(f" ! LLM returned no parseable JSON (stop_reason={message.stop_reason})")
print(f" content: {content[:500]!r}")
return parsed
except anthropic.APIError as e:
print(f" ! Anthropic API error: {e}")
return None
def request_metadata(title, note_content, taxonomy):
def request_metadata(title: str, note_content: str, taxonomy: list[str]) -> dict | None:
taxonomy_str = ", ".join(taxonomy)
system_prompt = f"""You analyze markdown notes and return structured metadata.
@@ -145,13 +156,14 @@ Produce the JSON described in the system prompt."""
return call_llm_json(system_prompt, user_prompt)
def request_description_retry(title, note_content, previous_desc):
def request_description_retry(title: str, note_content: str, previous_desc: str) -> str:
system_prompt = f"""You rewrite SEO descriptions to a strict length.
Return ONLY valid JSON of the form:
{{"seo_description": "..."}}
The description must be a clean, factual summary of the note, STRICTLY between {SEO_DESC_MIN} and {SEO_DESC_MAX} characters inclusive. Count characters carefully before responding."""
user_prompt = f"""Note title: {title}
Note content:
@@ -161,48 +173,51 @@ Your previous description was {len(previous_desc)} characters, outside the allow
"{previous_desc}"
Rewrite it to fit strictly within {SEO_DESC_MIN}-{SEO_DESC_MAX} characters."""
result = call_llm_json(system_prompt, user_prompt, max_tokens=400)
result = call_llm_json(system_prompt, user_prompt)
if result:
return (result.get('seo_description') or '').strip()
return ''
return (result.get("seo_description") or "").strip()
return ""
def process_note(file_path, taxonomy, new_tag_accumulator):
# ---------------------------------------------------------------------------
# Note processing
# ---------------------------------------------------------------------------
def process_note(file_path: Path, taxonomy: list[str], new_tag_accumulator: set) -> None:
print(f"Processing: {file_path}")
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
content = file_path.read_text(encoding="utf-8")
frontmatter, body = extract_frontmatter(content)
if frontmatter is None:
print(" ⚠️ No frontmatter found, skipping")
return
existing_tags = frontmatter.get('tags', []) or []
existing_tags = frontmatter.get("tags", []) or []
if existing_tags == [None]:
existing_tags = []
needs_tags = not existing_tags
needs_slug = not frontmatter.get('slug')
needs_seo_title = not frontmatter.get('seo-title')
needs_seo_desc = not frontmatter.get('seo-description')
needs_seo_keywords = not frontmatter.get('seo-keywords')
needs_slug = not frontmatter.get("slug")
needs_seo_title = not frontmatter.get("seo-title")
needs_seo_desc = not frontmatter.get("seo-description")
needs_seo_keywords = not frontmatter.get("seo-keywords")
if not (needs_tags or needs_slug or needs_seo_title or needs_seo_desc or needs_seo_keywords):
if not any([needs_tags, needs_slug, needs_seo_title, needs_seo_desc, needs_seo_keywords]):
print(" ✓ All fields already populated, skipping")
return
title = frontmatter.get('title') or Path(file_path).stem
title = frontmatter.get("title") or file_path.stem
updated = False
if needs_slug:
slug = slugify(Path(file_path).stem)
frontmatter['slug'] = slug
slug = slugify(file_path.stem)
frontmatter["slug"] = slug
print(f" + Added slug: {slug}")
updated = True
if not (needs_tags or needs_seo_title or needs_seo_desc or needs_seo_keywords):
with open(file_path, 'w', encoding='utf-8') as f:
f.write(reconstruct_markdown(frontmatter, body))
# If only slug was needed, skip the LLM call
if not any([needs_tags, needs_seo_title, needs_seo_desc, needs_seo_keywords]):
file_path.write_text(reconstruct_markdown(frontmatter, body), encoding="utf-8")
print(" ✓ Updated successfully")
return
@@ -212,11 +227,11 @@ def process_note(file_path, taxonomy, new_tag_accumulator):
return
if needs_tags:
taxonomy_tags = llm_response.get('tags_from_taxonomy') or []
new_suggestions = llm_response.get('new_tag_suggestions') or []
taxonomy_tags = llm_response.get("tags_from_taxonomy") or []
new_suggestions = llm_response.get("new_tag_suggestions") or []
combined = list(dict.fromkeys(list(taxonomy_tags) + list(new_suggestions)))[:5]
if combined:
frontmatter['tags'] = combined
frontmatter["tags"] = combined
updated = True
print(f" + Added tags: {', '.join(combined)}")
genuinely_new = [t for t in combined if t not in taxonomy and t in new_suggestions]
@@ -225,19 +240,17 @@ def process_note(file_path, taxonomy, new_tag_accumulator):
new_tag_accumulator.update(genuinely_new)
if needs_seo_title:
suffix = (llm_response.get('seo_title_suffix') or '').strip()
suffix = suffix.lstrip(':').strip()
# Strip a leading repeat of the title if the LLM included it anyway
suffix = (llm_response.get("seo_title_suffix") or "").strip().lstrip(":").strip()
if suffix.lower().startswith(title.lower()):
suffix = suffix[len(title):].lstrip(':').strip()
suffix = suffix[len(title):].lstrip(":").strip()
if suffix:
seo_title = f"{title}: {suffix}"
frontmatter['seo-title'] = seo_title
frontmatter["seo-title"] = seo_title
updated = True
print(f" + Added SEO title: {seo_title}")
if needs_seo_desc:
seo_desc = (llm_response.get('seo_description') or '').strip()
seo_desc = (llm_response.get("seo_description") or "").strip()
if seo_desc and not (SEO_DESC_MIN <= len(seo_desc) <= SEO_DESC_MAX):
print(f" ~ SEO description length {len(seo_desc)} outside {SEO_DESC_MIN}-{SEO_DESC_MAX}, re-asking")
retry = request_description_retry(title, body[:CONTENT_CHAR_LIMIT], seo_desc)
@@ -248,30 +261,32 @@ def process_note(file_path, taxonomy, new_tag_accumulator):
else:
print(" ! Retry failed; using original")
if seo_desc:
frontmatter['seo-description'] = seo_desc
frontmatter["seo-description"] = seo_desc
updated = True
print(f" + Added SEO description ({len(seo_desc)} chars)")
if needs_seo_keywords:
seo_keywords = list(dict.fromkeys(llm_response.get('seo_keywords') or []))
seo_keywords = list(dict.fromkeys(llm_response.get("seo_keywords") or []))
if seo_keywords:
frontmatter['seo-keywords'] = seo_keywords
frontmatter["seo-keywords"] = seo_keywords
updated = True
print(f" + Added {len(seo_keywords)} SEO keywords")
if updated:
with open(file_path, 'w', encoding='utf-8') as f:
f.write(reconstruct_markdown(frontmatter, body))
file_path.write_text(reconstruct_markdown(frontmatter, body), encoding="utf-8")
print(" ✓ Updated successfully")
else:
print(" - No updates needed")
def main():
# ---------------------------------------------------------------------------
# Entry point
# ---------------------------------------------------------------------------
def main() -> None:
taxonomy_path = Path(__file__).parent / TAXONOMY_FILE
if not taxonomy_path.exists():
print(f"Error: Taxonomy file not found at {taxonomy_path}")
print(f"Please create {TAXONOMY_FILE} in the same directory as this script")
sys.exit(1)
taxonomy = load_taxonomy(taxonomy_path)
@@ -281,11 +296,8 @@ def main():
if not target_path.exists():
print(f"Error: Notes folder not found: {target_path}")
sys.exit(1)
if not target_path.is_dir():
print(f"Error: {target_path} is not a directory")
sys.exit(1)
md_files = sorted(target_path.rglob('*.md'))
md_files = sorted(target_path.rglob("*.md"))
if not md_files:
print(f"No markdown files found in {target_path}")
sys.exit(0)
@@ -293,7 +305,7 @@ def main():
print(f"Processing all markdown files under: {target_path}")
print(f"Found {len(md_files)} markdown files\n")
new_tag_accumulator = set()
new_tag_accumulator: set[str] = set()
for md_file in md_files:
try:
process_note(md_file, taxonomy, new_tag_accumulator)
@@ -304,17 +316,27 @@ def main():
print("\n✓ Processing complete!")
fresh = sorted(t for t in new_tag_accumulator if t not in taxonomy)
if fresh:
print(f"\nNew tags suggested during this run: {', '.join(fresh)}")
try:
answer = input("Add these to the taxonomy? [y/N]: ").strip().lower()
except EOFError:
answer = ''
if answer == 'y':
append_tags_to_taxonomy(taxonomy_path, fresh)
print(f"✓ Added {len(fresh)} tag(s) to {taxonomy_path.name}")
else:
print("Skipped taxonomy update.")
if not fresh:
return
print(f"\nNew tags suggested during this run: {', '.join(fresh)}")
# Non-interactive (CI): log and skip
if not sys.stdin.isatty():
print("Non-interactive environment detected — skipping taxonomy update.")
print(f"To add these manually, run the script locally and answer 'y' when prompted.")
return
try:
answer = input("Add these to the taxonomy? [y/N]: ").strip().lower()
except EOFError:
answer = ""
if answer == "y":
append_tags_to_taxonomy(taxonomy_path, fresh)
print(f"✓ Added {len(fresh)} tag(s) to {taxonomy_path.name}")
else:
print("Skipped taxonomy update.")
if __name__ == "__main__":
+50 -45
View File
@@ -1,46 +1,51 @@
# Tag Taxonomy for Note Tagging
# Add new tags here as the LLM suggests good ones
tags:
# Technology & Development
- self-hosting
- linux
- automation
- ai-tools
- web-development
- infrastructure
- docker
- security
- privacy
# Work & Management
- project-management
- business-analysis
- leadership
- agile
- team-dynamics
- process-improvement
- governance
# Knowledge & Learning
- knowledge-management
- zettelkasten
- note-taking
- learning
- productivity
# Philosophy & Spirituality
- buddhism
- eastern-philosophy
- meditation
- mindfulness
# Literature & Writing
- literature
- postmodernism
- writing
# Personal Interests
- plants
- aroids
- gardening
- self-hosting
- linux
- automation
- ai-tools
- web-development
- infrastructure
- docker
- security
- privacy
- project-management
- business-analysis
- leadership
- agile
- team-dynamics
- process-improvement
- governance
- knowledge-management
- zettelkasten
- note-taking
- learning
- productivity
- buddhism
- eastern-philosophy
- meditation
- mindfulness
- literature
- postmodernism
- writing
- plants
- aroids
- gardening
- memoir
- personal-essay
- reflection
- family
- parenting
- recovery
- mental-health
- aging
- relationships
- childhood
- identity
- morning-routine
- sleep-habits
- baking
- cooking-techniques
- fermentation
- food-baking
- kitchen-hacks
- sourdough