Анализ комментариев к Karpathy LLM Wiki

Источник: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
Дата анализа: 2026-05-04 15:45 UTC
Объём: 727 комментариев GitHub Gist через GitHub comments API + исходный llm-wiki.md.

Короткий вывод

Комментарии показывают, что идея LLM Wiki сразу стала продуктовой категорией: люди делают CLI, Obsidian-плагины, MCP-серверы, graph/RAG-гибриды, daemon-память для Claude/Codex/Cursor и team wiki. Но почти все сильные комментарии сходятся не на “ещё одном генераторе markdown”, а на следующем слое:

provenance / citations / claim-level receipts;
stale/contradiction detection;
hybrid search: wiki + BM25/FTS/vector/graph;
typed entities and relationships;
human review / approval queue;
multi-agent shared memory;
git/audit log / rollback / reverse ingestion;
multimodal ingest;
benchmarks/evals.

Для Trip2G лучший угол: Trip2G как operational/network layer для LLM Wiki — hosted Markdown/MCP база, которую агенты могут читать, обновлять, цитировать, синкать, федеративно связывать и проверять. Не “RAG killer”, а compiled knowledge artifact + MCP interface + provenance/audit/sync.

О чём сам gist

Karpathy описывает паттерн:

вместо того чтобы каждый запрос делать через RAG по сырым документам;
LLM постепенно строит и поддерживает persistent interlinked markdown wiki;
wiki становится compounding artifact;
raw sources остаются первичным слоем;
Obsidian — IDE, LLM — programmer, wiki — codebase;
wiki применима к personal memory, research, books, business/team, competitive analysis, due diligence, trip planning, course notes.

Ключевая фраза, на которую реагируют комментарии: the wiki is a persistent, compounding artifact.

Основные темы комментариев

1. Восторг и массовый immediate adoption

Много комментариев — “great idea”, “I built one”, “will try this”. Это сигнал, что боль ясна: обычный RAG заново открывает знания каждый раз, а люди хотят накопительную структуру.

Представители:

#33 @MagicUncleDave — “core Zeitgeist right now”;
#72 @SeeknnDestroy — агенты и люди должны держать memory up to date;
#180 @justlovemaki — RAG как “rediscover everything every time” ощущается wasteful;
#211 @pssah4 — wiki как “pre-compiled intermediate layer”.

2. Взрыв реализаций и self-promo

Комментарии быстро превратились в каталог проектов: CLI, Obsidian vaults, MCP servers, desktop apps, web UI, graph memory, research wiki, code wiki, Confluence-to-MCP.

Примеры:

#44 @VihariKanukollu — browzy.ai: ingest, compile, query, lint;
#92 @xoai — sage-wiki: single Go binary;
#177 @originlabs-app — agent-wiki: slash commands, pure markdown;
#187 @ethanj — llm-wiki-compiler: incremental rebuilds;
#228 @iamsashank09 — MCP server for LLM Wiki;
#523 @qhuang20 — Claude Code plugin with SessionStart hook;
#613 @kogarashi86 — Confluence tree into Markdown wiki, search index, MCP layer;
#680 @7xuanlu — desktop app + background daemon.

Сигнал для Trip2G: рынок уже понял “LLM Wiki”. Нужно отличаться не наличием markdown, а operational features: sync, MCP, network/federation, provenance, team workflow.

3. Obsidian / Markdown как human-readable слой

Многие используют Obsidian, но часто говорят, что Obsidian — только viewer/IDE поверх файлов.

Представители:

#20 @logancautrell — Zed + Obsidian workflow;
#22 @ppeirce — Dataview/Bases;
#182 @originlabs-app — “Obsidian is just a viewer”;
#249 @jurajskuska — “OBSIDIAN COULD BE A STANDARD”.

Trip2G angle: “your files stay Markdown”, но появляется hosted/MCP/network layer для агентов.

4. Index.md не масштабируется

Очень частая тема: один index.md работает на малых базах, но ломается при сотнях/тысячах страниц.

Представители:

#60 @druce — chunking the wiki, LanceDB;
#91 @bluewater8008 — classify before extract, token budget for index;
#98 @mpazik — index helps early but does not scale;
#125 @tashisleepy — wiki enough only до 500+ docs;
#232/#233 @singularityjason/@omega-memory — index.md overflows context window;
#269 @bitsofchris — overfetch, deduplicate, rerank for diversity;
#304 @shibing624 — TreeSearch / structure-aware retrieval;
#331 @marktran0710 — BM25 + vector + RRF + LLM evaluator;
#620 @gulliveruk — retrieval itself becomes a reasoning task;
#725 @superimpactful — missing routing tables and non-hierarchical taxonomies.

Вывод: Trip2G должен говорить не “у нас есть index.md”, а “у нас есть agent navigation contract”: indexes, search, focused reads, MCP methods, routing, backlinks, graph-like navigation.

5. Provenance, citations, truth maintenance

Самая сильная нерешённая боль: если LLM записал факт в wiki, как доказать источник? Как понять, что факт устарел? Как удалить плохой источник?

Представители:

#77 @laphilosophia — hardest part is truth maintenance;
#131 @Paul-Kyle — “git blame on every fact”;
#132 @Jwcjwc12 — every proposition records source files and content hashes;
#176 @tomjwxf — synthesis without citations is core unsolved problem;
#234 @ap0phasi — data provenance nightmare;
#250 @Marekai — page-level citations and no hallucinated citations;
#287 @asong56 — hallucinations can become permanently embedded;
#374 @Shagun0402 — trading ephemeral hallucinations for persistent errors;
#615 @vincent-pli — how to remove/reverse an ingestion;
#714 @kytmanov — every claim links back to source pages with content hashes.

Функциональное расширение:

claims.jsonl или frontmatter для claims;
source id, source hash, timestamp, paragraph/page span;
confidence/status: verified, disputed, stale, human-approved;
reverse ingestion: удалить источник и пересобрать зависимые страницы;
stale detection по source hash;
citation accuracy benchmark.

6. Lint, drift detection, contradiction checks

Люди хотят не просто “wiki generated”, а “wiki can be audited”.

Представители:

#42 @skpalan — periodically audit its own wiki;
#99 @localwolfpackai — Counter-Arguments & Data Gaps;
#138 @trox — drift detection plugin;
#213 @frosk1 — error accumulation & drift;
#218 @sovahc — cache poisoning risk;
#381 — multi-category wiki lint;
#680 — daemon with dedup and contradiction checks;
#720 — doctor: graph integrity, source hygiene, secret-looking files.

Trip2G feature idea: trip2g doctor / MCP method for:

broken wikilinks;
stale pages;
unverified claims;
contradictory claims;
orphan pages;
missing frontmatter;
secret-looking files;
pages without source citations.

7. Typed entities, ontology, graph layer

Многие считают, что markdown-ссылки быстро превращаются в неявную БД, значит нужна схема.

Представители:

#57 @buremba — entity types, strict schema, event log, Postgres;
#98 @mpazik — wiki grows a schema on its own;
#134 @blex2011 — graph database built on ontology;
#146 @xoai — ontology is the hardest part;
#261 @anzal1 — confidence-scored claims, temporal tracking, contradiction detection;
#399 @abbacusgroup — OWL-RL ontology + SPARQL + SQLite FTS5;
#462 @payneio — wiki is just a view for humans, graphdb behind it;
#498 @nigelglenday — flat markdown backlinks not enough, typed entities/relationships;
#540 @skyllwt — typed entities and typed edges;
#590 @sly-codechum — entity linking, conflict handling, supersession, temporal validity.

Trip2G angle: Markdown как portable artifact, но можно добавить schema layer:

type: concept/person/org/source/task/claim/decision;
relations: supports, contradicts, supersedes, depends_on, derived_from;
MCP methods to search by type/relation;
graph view / routing table as generated notes.

8. Human-in-the-loop and anti-slop

Сильная критика: если LLM сам пишет wiki без контроля, получается “slop layer”; автосводка не равна мышлению.

Представители:

#217 @robertandrews — generation effect: active processing matters;
#219 @Runecreed — slop machine;
#235 @Yuncun — AI slop layer to maintain;
#410 @asakin — first N days human reviews writes;
#412 @jurajskuska — humans are the answer;
#414 @IlyaGorsky — nothing writes without human confirmation;
#475/#480 @gnusupport — LLM should refresh, not curate;
#657 @earaizapowerera — LLM can propose, human authorizes;
#703 @paulshomo — genuine wiki needs splitting, merging, diffing, versioning.

Trip2G positioning: “anti-slop wiki” — agents propose changes, but durable artifacts have logs, citations, review, diffs, rollback.

9. Multi-agent / team memory

Комментарии видят LLM Wiki как shared memory for agents and teams.

Представители:

#13 @geetansharora — how to share knowledge base with team;
#46 @Arrmlet — multiple LLM agents maintaining wiki in parallel;
#174 @GeminiLight — multi-agent is the real unlock;
#202 @jurajskuska — specialized agents, sandboxing;
#215 @pnakamura — 7 specialized agents not getting smarter without shared memory;
#253 @junbjnnn — repo wiki for PRDs, meeting notes, ADRs, runbooks;
#364 @vitalii-ivanov-rakuten — team wiki as git submodule, Claude PRs upstream;
#426 @redmizt — identity, access control, contamination prevention, concurrency;
#494 @devsarangi2 — same corpus has different value for PM vs engineer;
#679 @gowtham0992 — works with Kiro, Claude Code, Cursor, Codex;
#694 @AEVYRA — multi-agent drift across Claude/Codex/Gemini;
#727 @zhurudong — same file works across Claude Code / Codex / OpenCode.

Trip2G angle: shared MCP-accessible knowledge base with roles, access, logs, agent-neutral contract, review queue.

10. Multimodal ingest and event-driven capture

Люди хотят не только markdown/PDF, но всё, что становится знанием.

Представители:

#102 @uggrock — PDFs, emails, screenshots, voice memo transcripts;
#105 @Okohedeki — TikTok, tweets, YouTube;
#181 @lucasastorian — PDFs, PowerPoints, Word, Excel;
#195 @liamsysmind — text, files, audio recordings, local voice memo transcription;
#224 @xoai — PDFs, Word, spreadsheets, PowerPoints, EPUBs, emails, images;
#393 — 50+ formats;
#724 @VeniVeci — VLM Wiki for photos/videos.

Trip2G opportunity: adapters/importers as landing pages:

Obsidian-to-MCP;
Notion/Drive-to-MCP;
Slack/Discord archives;
Telegram channels;
PDFs and docs;
YouTube/RSS/sitemap;
screenshots/voice memos.

Как предлагают расширить функции LLM Wiki

Core functional extensions

Claim-level provenance
- Each claim knows source, source hash, quote/page span, timestamp, model run, confidence.
Citation-aware PDF/document ingest
- Page-level citations, paragraph IDs, bounding boxes, original quote snippets.
Staleness and reverse ingestion
- If source changes or is removed, mark derived notes/claims as stale and selectively rebuild.
Contradiction and counter-argument layer
- Automatic sections: disputed claims, counter-arguments, data gaps, unverified claims.
Multi-model verification
- Verifier, devil’s advocate, synthesizer; consensus/voting for important facts.
Hybrid retrieval over wiki
- Markdown as artifact, FTS/BM25/vector/graph as search layer.
Progressive index / routing tables
- Global index → domain index → page TLDR → full page; token-budgeted indexes.
Typed ontology and entity resolution
- Entity pages for people/orgs/papers/concepts/tasks/decisions/sources; typed relations.
Human review queue
- Draft updates, staged candidate buffer, approval before canonical write, rejection feedback.
Git-native audit and rollback

Every change is commit-like: reason, affected claims, source hashes, diffs.

Multi-agent write coordination

Task claiming, locks, roles, permissions, serial write daemon, branch/PR workflows.

Multimodal/event-driven ingest

Watch folders, mobile capture, share sheets, voice memo ingestion, screenshots, browser history.

Benchmarks/evals

Fixed corpora, recall@k, citation accuracy, contradiction detection, freshness, token cost.

Security scanner

Detect secrets/PII/private docs before publishing or agent exposure.

Что это значит для Trip2G

Trip2G уже хорошо ложится на несколько сильных болей из комментариев:

LLM Wiki должен быть доступен агентам → Trip2G может давать MCP over Markdown vault.
Wiki должен быть artifact, not chat → Trip2G публикует/синхронизирует markdown-базу как durable artifact.
Нужна team/multi-agent memory → Trip2G может быть shared hub, а не локальная папка одного агента.
Нужна navigation beyond index.md → Trip2G может развивать MCP search/focused reads/routing.
Нужна provenance/audit → Trip2G может добавить claim/source/log patterns поверх markdown.
Нужна federation/network → Trip2G может связать несколько баз как knowledge mesh.

Что стоит добавить в roadmap / landing

Now:

Docs/landing page: “Run Karpathy-style LLM Wiki as an MCP-accessible hosted Markdown vault”.
Example vault: raw sources + wiki pages + _index.md + AGENTS.md + 99 - Журнал действий.md.
MCP quickstart: search → note_html → update via sync/API.
Navigation docs: _header.md, _footer.md, wikilinks, index-first traversal.

SOURCES.md / sources/ convention.
claims.jsonl or frontmatter schema for claim provenance.
trip2g doctor: broken links, stale pages, missing citations, secrets scanner.
Focused reads and match IDs in MCP docs.
Review queue pattern: draft note → human-approved canonical note.

Later:

typed entity graph over markdown;
reverse ingestion;
source-hash stale rebuild;
role-aware compiled views;
multi-agent locks/permissions;
federation across Trip2G bases.

Какой комментарий можно написать про Trip2G

Короткий комментарий

This pattern strongly matches what we are building with Trip2G: a hosted, MCP-accessible layer for Karpathy-style LLM Wikis over plain Markdown/Obsidian vaults.

The wiki stays the durable artifact, but agents can search it, read focused pages, update it through sync/API, keep logs/provenance beside the content, and later federate multiple wikis into a knowledge mesh.

I think the next hard parts after “LLM writes markdown” are exactly what many comments here mention: provenance for claims, stale-source detection, reviewable updates, multi-agent coordination, and navigation beyond one index.md. We are exploring those as conventions like AGENTS.md, SOURCES.md, logs, wikilinks, MCP search/read methods, and sync workflows.

If anyone is building LLM Wiki tooling and wants an MCP/hosted Markdown layer to interoperate with, I’d love to compare notes: https://trip2g.com/

Более прямой комментарий с позиционированием

Karpathy’s framing “the wiki is the artifact, not the chat” is exactly the direction I think agent memory is moving.

One thing we are experimenting with in Trip2G is making that artifact networked and agent-readable: plain Markdown/Obsidian vaults published as a site + exposed through MCP, so Claude/Codex/Cursor-style agents can search, read, cite, and update the wiki instead of keeping knowledge trapped inside a chat/session.

A lot of comments here point to the same next layer: provenance, stale facts, review queues, multi-agent writes, and better navigation than a single index.md. My current hypothesis is that LLM Wiki needs an operational layer: sync, logs, source conventions, MCP tools, and eventually federation between knowledge bases.

Would be happy to connect with people building in this direction: https://trip2g.com/

Самый короткий вариант, чтобы не выглядело как реклама

One extension I’d love to see standardized: an MCP contract for LLM Wikis.

If the wiki is the durable artifact, agents need a common way to search it, read focused pages, inspect sources/provenance, and propose updates safely. We’re exploring this in Trip2G with plain Markdown/Obsidian vaults exposed as hosted MCP-readable knowledge bases: https://trip2g.com/

Рекомендация: использовать самый короткий вариант. В треде уже много self-promo, поэтому лучше зайти через идею стандарта MCP contract, а Trip2G дать как один пример.

Acceptance criteria для дальнейшего Trip2G follow-up

Сделать отдельную landing/article page: “Karpathy LLM Wiki → MCP knowledge base”.
Дать минимальный пример vault с AGENTS.md, _index.md, _header.md, _footer.md, SOURCES.md, 99 - Журнал действий.md.
Документировать agent workflow: search, read focused page, write note, update index/log, sync.
Добавить публичный пример MCP/tools output без секретов.
Добавить roadmap section: provenance, stale detection, review queue, federation.

Источники / raw evidence

Gist: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
Raw gist markdown: llm-wiki.md
Comments API: https://api.github.com/gists/442a6bf555914893e9891c11519de94f/comments?per_page=100&page=N
Всего комментариев: 727.

Анализ комментариев к Karpathy LLM Wiki

Короткий вывод

О чём сам gist

Основные темы комментариев

1. Восторг и массовый immediate adoption

2. Взрыв реализаций и self-promo

3. Obsidian / Markdown как human-readable слой

4. Index.md не масштабируется

5. Provenance, citations, truth maintenance

6. Lint, drift detection, contradiction checks

7. Typed entities, ontology, graph layer

8. Human-in-the-loop and anti-slop

9. Multi-agent / team memory

10. Multimodal ingest and event-driven capture

Как предлагают расширить функции LLM Wiki

Core functional extensions

Что это значит для Trip2G

Рекомендуемое позиционирование

Что стоит добавить в roadmap / landing

Какой комментарий можно написать про Trip2G

Короткий комментарий

Более прямой комментарий с позиционированием

Самый короткий вариант, чтобы не выглядело как реклама

Acceptance criteria для дальнейшего Trip2G follow-up

Источники / raw evidence