Chunking#
Prepares text for semantic search & RAG.
Inputs#
name,summary,description- README/excerpts referenced by manifest or repository
- Examples / usage snippets (if provided)
Strategy#
- Hierarchical splitting:
- Headings (
#,##) → paragraphs → sentences - Target chunk size ~ 300–600 tokens (configurable)
- Metadata per chunk:
entity_uid,section,position,weight,source_uri,checksum
Weights#
- Title/name: higher prior (e.g., 1.3×)
- Summary: 1.2×
- README/body: 1.0×
- Examples/code: 1.1×
Output#
- A list of
(chunk_id, entity_uid, text, weight, meta)ready for embedding.