tenets.core.distiller
Package¶
Distiller module - Extract and aggregate relevant context from codebases.
The distiller is responsible for the main 'distill' command functionality: 1. Understanding what the user wants (prompt parsing) 2. Finding relevant files (discovery) 3. Ranking by importance (intelligence) 4. Packing within token limits (optimization) 5. Formatting for output (presentation)
Classes¶
ContextAggregator¶
Aggregates files intelligently within token constraints.
Initialize the aggregator.
PARAMETER | DESCRIPTION |
---|---|
config | Tenets configuration TYPE: |
Attributes¶
configinstance-attribute
¶
loggerinstance-attribute
¶
strategiesinstance-attribute
¶
strategies = {'greedy': AggregationStrategy(name='greedy', max_full_files=20, summarize_threshold=0.6, min_relevance=0.05), 'balanced': AggregationStrategy(name='balanced', max_full_files=10, summarize_threshold=0.7, min_relevance=0.08), 'conservative': AggregationStrategy(name='conservative', max_full_files=5, summarize_threshold=0.8, min_relevance=0.15)}
Functions¶
aggregate¶
aggregate(files: List[FileAnalysis], prompt_context: PromptContext, max_tokens: int, model: Optional[str] = None, git_context: Optional[Dict[str, Any]] = None, strategy: str = 'balanced', full: bool = False, condense: bool = False, remove_comments: bool = False, docstring_weight: Optional[float] = None, summarize_imports: bool = True) -> Dict[str, Any]
Aggregate files within token budget.
PARAMETER | DESCRIPTION |
---|---|
files | Ranked files to aggregate TYPE: |
prompt_context | Context about the prompt TYPE: |
max_tokens | Maximum token budget TYPE: |
model | Target model for token counting |
git_context | Optional git context to include |
strategy | Aggregation strategy to use TYPE: |
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any] | Dictionary with aggregated content and metadata |
optimize_packing¶
optimize_packing(files: List[FileAnalysis], max_tokens: int, model: Optional[str] = None) -> List[Tuple[FileAnalysis, bool]]
Optimize file packing using dynamic programming.
This is a more sophisticated packing algorithm that tries to maximize total relevance score within token constraints.
PARAMETER | DESCRIPTION |
---|---|
files | Files to pack TYPE: |
max_tokens | Token budget TYPE: |
model | Model for token counting |
RETURNS | DESCRIPTION |
---|---|
List[Tuple[FileAnalysis, bool]] | List of (file, should_summarize) tuples |
Distiller¶
Orchestrates context extraction from codebases.
The Distiller is the main engine that powers the 'distill' command. It coordinates all the components to extract the most relevant context based on a user's prompt.
Initialize the distiller with configuration.
PARAMETER | DESCRIPTION |
---|---|
config | Tenets configuration TYPE: |
Attributes¶
configinstance-attribute
¶
loggerinstance-attribute
¶
scannerinstance-attribute
¶
analyzerinstance-attribute
¶
rankerinstance-attribute
¶
parserinstance-attribute
¶
gitinstance-attribute
¶
aggregatorinstance-attribute
¶
optimizerinstance-attribute
¶
formatterinstance-attribute
¶
Functions¶
distill¶
distill(prompt: str, paths: Optional[Union[str, Path, List[Path]]] = None, *, format: str = 'markdown', model: Optional[str] = None, max_tokens: Optional[int] = None, mode: str = 'balanced', include_git: bool = True, session_name: Optional[str] = None, include_patterns: Optional[List[str]] = None, exclude_patterns: Optional[List[str]] = None, full: bool = False, condense: bool = False, remove_comments: bool = False, pinned_files: Optional[List[Path]] = None, include_tests: Optional[bool] = None, docstring_weight: Optional[float] = None, summarize_imports: bool = True) -> ContextResult
Distill relevant context from codebase based on prompt.
This is the main method that extracts, ranks, and aggregates the most relevant files and information for a given prompt.
PARAMETER | DESCRIPTION |
---|---|
prompt | The user's query or task description TYPE: |
paths | Paths to analyze (default: current directory) |
format | Output format (markdown, xml, json) TYPE: |
model | Target LLM model for token counting |
max_tokens | Maximum tokens for context |
mode | Analysis mode (fast, balanced, thorough) TYPE: |
include_git | Whether to include git context TYPE: |
session_name | Session name for stateful context |
include_patterns | File patterns to include |
exclude_patterns | File patterns to exclude |
RETURNS | DESCRIPTION |
---|---|
ContextResult | ContextResult with the distilled context |
Example
distiller = Distiller(config) result = distiller.distill( ... "implement OAuth2 authentication", ... paths="./src", ... mode="thorough", ... max_tokens=50000 ... ) print(result.context)
ContextFormatter¶
Formats aggregated context for output.
Initialize the formatter.
PARAMETER | DESCRIPTION |
---|---|
config | Tenets configuration TYPE: |
Attributes¶
configinstance-attribute
¶
loggerinstance-attribute
¶
Functions¶
format¶
format(aggregated: Dict[str, Any], format: str, prompt_context: PromptContext, session_name: Optional[str] = None) -> str
Format aggregated context for output.
PARAMETER | DESCRIPTION |
---|---|
aggregated | Aggregated context data containing files and statistics. |
format | Output format (markdown, xml, json, html). TYPE: |
prompt_context | Original prompt context with task analysis. TYPE: |
session_name | Optional session name for context tracking. |
RETURNS | DESCRIPTION |
---|---|
str | Formatted context string in the requested format. |
RAISES | DESCRIPTION |
---|---|
ValueError | If format is not supported. |
TokenOptimizer¶
Optimizes token usage for maximum context value.
Initialize the optimizer.
PARAMETER | DESCRIPTION |
---|---|
config | Tenets configuration TYPE: |
Attributes¶
configinstance-attribute
¶
loggerinstance-attribute
¶
Functions¶
create_budget¶
create_budget(model: Optional[str], max_tokens: Optional[int], prompt_tokens: int, has_git_context: bool = False, has_tenets: bool = False) -> TokenBudget
Create a token budget for context generation.
PARAMETER | DESCRIPTION |
---|---|
model | Target model name. |
max_tokens | Optional hard cap on total tokens; overrides model default. |
prompt_tokens | Tokens used by the prompt/instructions. TYPE: |
has_git_context | Whether git context will be included. TYPE: |
has_tenets | Whether tenets will be injected. TYPE: |
RETURNS | DESCRIPTION |
---|---|
TokenBudget | Configured budget with reserves. TYPE: |
optimize_file_selection¶
optimize_file_selection(files: List[FileAnalysis], budget: TokenBudget, strategy: str = 'balanced') -> List[Tuple[FileAnalysis, str]]
Optimize file selection within budget.
Uses different strategies to select which files to include and whether to summarize them.
PARAMETER | DESCRIPTION |
---|---|
files | Ranked files to consider TYPE: |
budget | Token budget to work within TYPE: |
strategy | Selection strategy (greedy, balanced, diverse) TYPE: |
RETURNS | DESCRIPTION |
---|---|
List[Tuple[FileAnalysis, str]] | List of (file, action) tuples where action is 'full' or 'summary' |
estimate_tokens_for_git¶
Estimate tokens needed for git context.
estimate_tokens_for_tenets¶
Estimate tokens needed for tenet injection.
Modules¶
aggregator
- Aggregator moduledistiller
- Distiller moduleformatter
- Formatter moduleoptimizer
- Optimizer moduletransform
- Transform module