Configuration Guide¶
Comprehensive guide to configuring Tenets for optimal code context building.
Overview¶
Tenets uses a hierarchical configuration system with multiple override levels:
Precedence (lowest → highest): 1. Default configuration (built-in) 2. Project file (.tenets.yml
at repo root) 3. User file (~/.config/tenets/config.yml
or ~/.tenets.yml
) 4. Environment variables (TENETS_*
) 5. CLI flags (--mode
, --max-tokens
, etc.) 6. Programmatic overrides (Tenets(config=...)
)
Inspect configuration:
tenets config show # Full config
tenets config show --key ranking # Specific section
tenets config show --format json # JSON output
Files and locations¶
Tenets searches these locations in order and uses the first it finds: - ./.tenets.yml - ./.tenets.yaml - ./tenets.yml - ./.config/tenets.yml - ~/.config/tenets/config.yml - ~/.tenets.yml
Create a starter file:
- tenets config init # writes .tenets.yml in the current directory
Complete Configuration Schema¶
All available configuration sections and their options:
# ============= Core Settings =============
max_tokens: 100000 # Maximum tokens for context (default: 100000)
debug: false # Enable debug logging
quiet: false # Suppress non-essential output
# ============= File Scanning =============
scanner:
respect_gitignore: true # Honor .gitignore patterns
follow_symlinks: false # Follow symbolic links
max_file_size: 5000000 # Max file size in bytes (5MB)
max_files: 10000 # Maximum files to scan
binary_check: true # Skip binary files
encoding: utf-8 # File encoding
workers: 4 # Parallel scanning workers
parallel_mode: auto # auto | thread | process
timeout: 5.0 # Timeout per file (seconds)
exclude_minified: true # Skip minified files
exclude_tests_by_default: true # Skip test files unless explicit
# Ignore patterns (in addition to .gitignore)
additional_ignore_patterns:
- '*.generated.*'
- vendor/
- node_modules/
- '*.egg-info/'
- __pycache__/
- .pytest_cache/
# Test file patterns
test_patterns:
- test_*.py
- '*_test.py'
- '*.test.js'
- '*.spec.ts'
# Test directories
test_directories:
- test
- tests
- __tests__
- spec
# ============= Ranking System =============
ranking:
algorithm: balanced # fast | balanced | thorough | ml | custom
threshold: 0.10 # 0.0-1.0 (lower includes more files)
text_similarity_algorithm: bm25 # bm25 (default) | tfidf (optional)
text_similarity_algorithm: bm25 # Using BM25 (default) # Deprecated - use text_similarity_algorithm instead
use_stopwords: false # Filter common tokens
use_embeddings: false # Semantic similarity (requires ML)
use_git: true # Include git signals
use_ml: false # Machine learning features
embedding_model: all-MiniLM-L6-v2 # Embedding model name
workers: 2 # Parallel ranking workers
parallel_mode: auto # thread | process | auto
batch_size: 100 # Files per batch
# Custom factor weights (0.0-1.0)
custom_weights:
keyword_match: 0.25
path_relevance: 0.20
import_graph: 0.20
git_activity: 0.15
file_type: 0.10
complexity: 0.10
# ============= Summarization =============
summarizer:
default_mode: auto # auto | extractive | abstractive
target_ratio: 0.3 # Target compression ratio
enable_cache: true # Cache summaries
preserve_code_structure: true # Keep code structure intact
summarize_imports: true # Condense import statements
import_summary_threshold: 5 # Min imports to trigger summary
max_cache_size: 100 # Max cached summaries
quality_threshold: medium # low | medium | high
batch_size: 10 # Files per batch
docstring_weight: 0.5 # Weight for docstrings
include_all_signatures: true # Include all function signatures
# LLM settings (optional)
llm_provider: null # openai | anthropic | null
llm_model: null # Model name
llm_temperature: 0.3 # Creativity (0.0-1.0)
llm_max_tokens: 500 # Max tokens per summary
enable_ml_strategies: false # Use ML summarization
# ============= Tenet System =============
tenet:
auto_instill: true # Auto-apply tenets
max_per_context: 5 # Max tenets per context
reinforcement: true # Reinforce important tenets
injection_strategy: strategic # strategic | sequential | random
min_distance_between: 1000 # Min chars between injections
prefer_natural_breaks: true # Insert at natural boundaries
storage_path: ~/.tenets/tenets # Tenet storage location
collections_enabled: true # Enable tenet collections
# Injection frequency
injection_frequency: adaptive # always | periodic | adaptive | manual
injection_interval: 3 # For periodic mode
session_complexity_threshold: 0.7 # Triggers adaptive injection
min_session_length: 5 # Min prompts before injection
# Advanced settings
adaptive_injection: true # Smart injection timing
track_injection_history: true # Track what was injected
decay_rate: 0.1 # How fast tenets decay
reinforcement_interval: 10 # Reinforce every N prompts
session_aware: true # Use session context
session_memory_limit: 100 # Max session history
persist_session_history: true # Save session data
# Priority settings
priority_boost_critical: 2.0 # Boost for critical tenets
priority_boost_high: 1.5 # Boost for high priority
skip_low_priority_on_complex: true # Skip low priority when complex
# System instruction
system_instruction: null # Global system instruction
system_instruction_enabled: false # Enable system instruction
system_instruction_position: top # top | bottom
system_instruction_format: markdown # markdown | plain
system_instruction_once_per_session: true # Inject once per session
# ============= Caching =============
cache:
enabled: true # Enable caching
directory: ~/.tenets/cache # Cache directory
ttl_days: 7 # Time to live (days)
max_size_mb: 500 # Max cache size (MB)
compression: false # Compress cache data
memory_cache_size: 1000 # In-memory cache entries
max_age_hours: 24 # Max cache age (hours)
# SQLite settings
sqlite_pragmas:
journal_mode: WAL
synchronous: NORMAL
cache_size: '-64000'
temp_store: MEMORY
# LLM cache
llm_cache_enabled: true # Cache LLM responses
llm_cache_ttl_hours: 24 # LLM cache TTL
# ============= Output Formatting =============
output:
default_format: markdown # markdown | xml | json | html
syntax_highlighting: true # Enable syntax highlighting
line_numbers: false # Show line numbers
max_line_length: 120 # Max line length
include_metadata: true # Include metadata
compression_threshold: 10000 # Compress if larger (chars)
summary_ratio: 0.25 # Summary compression ratio
copy_on_distill: false # Auto-copy to clipboard
show_token_usage: true # Show token counts
show_cost_estimate: true # Show LLM cost estimates
# ============= Git Integration =============
git:
enabled: true # Use git information
include_history: true # Include commit history
history_limit: 100 # Max commits to analyze
include_blame: false # Include git blame
include_stats: true # Include statistics
# Ignore these authors
ignore_authors:
- dependabot[bot]
- github-actions[bot]
- renovate[bot]
# Main branch names
main_branches:
- main
- master
- develop
- trunk
# ============= NLP Settings =============
nlp:
enabled: true # Enable NLP features
stopwords_enabled: true # Use stopwords
code_stopword_set: minimal # minimal | standard | aggressive
prompt_stopword_set: aggressive # minimal | standard | aggressive
custom_stopword_files: [] # Custom stopword files
# Tokenization
tokenization_mode: auto # auto | simple | advanced
preserve_original_tokens: true # Keep original tokens
split_camelcase: true # Split CamelCase
split_snakecase: true # Split snake_case
min_token_length: 2 # Min token length
# Keyword extraction
keyword_extraction_method: auto # auto | rake | yake | bm25 | tfidf
max_keywords: 30 # Max keywords to extract
ngram_size: 3 # N-gram size
yake_dedup_threshold: 0.7 # YAKE deduplication
# BM25 settings
bm25_k1: 1.2 # Term frequency saturation parameter
bm25_b: 0.75 # Length normalization parameter
# TF-IDF settings (when explicitly configured as alternative to BM25)
tfidf_use_sublinear: true # Sublinear TF scaling (only when TF-IDF is used)
tfidf_use_idf: true # Use IDF
tfidf_norm: l2 # Normalization
# Embeddings
embeddings_enabled: false # Enable embeddings
embeddings_model: all-MiniLM-L6-v2 # Model name
embeddings_device: auto # cpu | cuda | auto
embeddings_cache: true # Cache embeddings
embeddings_batch_size: 32 # Batch size
similarity_metric: cosine # cosine | euclidean | manhattan
similarity_threshold: 0.7 # Similarity threshold
# Cache settings
cache_embeddings_ttl_days: 30 # Embeddings cache TTL
cache_tfidf_ttl_days: 7 # BM25/TF-IDF cache TTL
cache_keywords_ttl_days: 7 # Keywords cache TTL
# Performance
multiprocessing_enabled: true # Use multiprocessing
multiprocessing_workers: null # null = auto-detect
multiprocessing_chunk_size: 100 # Chunk size
# ============= LLM Settings (Optional) =============
llm:
enabled: false # Enable LLM features
provider: openai # openai | anthropic | ollama
fallback_providers: # Fallback providers
- anthropic
- openrouter
# API keys (use environment variables)
api_keys:
openai: ${OPENAI_API_KEY}
anthropic: ${ANTHROPIC_API_KEY}
openrouter: ${OPENROUTER_API_KEY}
# API endpoints
api_base_urls:
openai: https://api.openai.com/v1
anthropic: https://api.anthropic.com/v1
openrouter: https://openrouter.ai/api/v1
ollama: http://localhost:11434
# Model selection
models:
default: gpt-4o-mini
summarization: gpt-3.5-turbo
analysis: gpt-4o
embeddings: text-embedding-3-small
code_generation: gpt-4o
# Rate limits and costs
max_cost_per_run: 0.1 # Max $ per run
max_cost_per_day: 10.0 # Max $ per day
max_tokens_per_request: 4000 # Max tokens per request
max_context_length: 100000 # Max context length
# Generation settings
temperature: 0.3 # Creativity (0.0-1.0)
top_p: 0.95 # Nucleus sampling
frequency_penalty: 0.0 # Frequency penalty
presence_penalty: 0.0 # Presence penalty
# Network settings
requests_per_minute: 60 # Rate limit
retry_on_error: true # Retry failed requests
max_retries: 3 # Max retry attempts
retry_delay: 1.0 # Initial retry delay
retry_backoff: 2.0 # Backoff multiplier
timeout: 30 # Request timeout (seconds)
stream: false # Stream responses
# Logging and caching
cache_responses: true # Cache LLM responses
cache_ttl_hours: 24 # Cache TTL (hours)
log_requests: false # Log requests
log_responses: false # Log responses
# ============= Custom Settings =============
custom: {} # User-defined custom settings
Key Configuration Notes¶
Ranking: - threshold
: Lower values (0.05-0.10) include more files, higher (0.20-0.30) for stricter matching - algorithm
: - fast
: Quick keyword matching (~10ms/file) - balanced
: Structural analysis + BM25 (default) - thorough
: Full analysis with relationships - ml
: Machine learning with embeddings (requires extras) - custom_weights
: Fine-tune ranking factors (values 0.0-1.0)
Scanner: - respect_gitignore
: Always honors .gitignore patterns - exclude_tests_by_default
: Tests excluded unless --include-tests
used - additional_ignore_patterns
: Added to built-in patterns
Tenet System: - auto_instill
: Automatically applies relevant tenets to context - injection_frequency
: - always
: Every distill - periodic
: Every N distills - adaptive
: Based on complexity - manual
: Only when explicitly called - system_instruction
: Global instruction added to all contexts
Output: - copy_on_distill
: Auto-copy result to clipboard - default_format
: Default output format (markdown recommended for LLMs)
Performance: - workers
: More workers = faster but more CPU/memory - cache.enabled
: Significantly speeds up repeated operations - ranking.batch_size
: Larger batches = more memory but faster
Environment Variable Overrides¶
Any configuration option can be overridden via environment variables.
Format: - Nested keys: TENETS_<SECTION>_<KEY>=value
- Top-level keys: TENETS_<KEY>=value
- Lists: Comma-separated values - Booleans: true
or false
(case-insensitive)
Common Examples:
# Core settings
export TENETS_MAX_TOKENS=150000
export TENETS_DEBUG=true
export TENETS_QUIET=false
# Ranking configuration
export TENETS_RANKING_ALGORITHM=thorough
export TENETS_RANKING_THRESHOLD=0.05
export TENETS_RANKING_TEXT_SIMILARITY_ALGORITHM=tfidf # Use TF-IDF instead of BM25
export TENETS_RANKING_USE_EMBEDDINGS=true
export TENETS_RANKING_WORKERS=4
# Scanner settings
export TENETS_SCANNER_MAX_FILE_SIZE=10000000
export TENETS_SCANNER_RESPECT_GITIGNORE=true
export TENETS_SCANNER_EXCLUDE_TESTS_BY_DEFAULT=false
# Output settings
export TENETS_OUTPUT_DEFAULT_FORMAT=xml
export TENETS_OUTPUT_COPY_ON_DISTILL=true
export TENETS_OUTPUT_SHOW_TOKEN_USAGE=false
# Cache settings
export TENETS_CACHE_ENABLED=false
export TENETS_CACHE_DIRECTORY=/tmp/tenets-cache
export TENETS_CACHE_TTL_DAYS=14
# Git settings
export TENETS_GIT_ENABLED=false
export TENETS_GIT_HISTORY_LIMIT=50
# Tenet system
export TENETS_TENET_AUTO_INSTILL=false
export TENETS_TENET_MAX_PER_CONTEXT=10
export TENETS_TENET_INJECTION_FREQUENCY=periodic
export TENETS_TENET_INJECTION_INTERVAL=5
# System instruction
export TENETS_TENET_SYSTEM_INSTRUCTION="You are a senior engineer. Focus on security and performance."
export TENETS_TENET_SYSTEM_INSTRUCTION_ENABLED=true
Usage Patterns:
# One-time override
TENETS_RANKING_ALGORITHM=fast tenets distill "fix bug"
# Session-wide settings
export TENETS_RANKING_THRESHOLD=0.05
export TENETS_OUTPUT_COPY_ON_DISTILL=true
tenets distill "implement feature" # Uses exported settings
# Verify configuration
tenets config show --key ranking
tenets config show --format json | jq '.ranking'
CLI Flags and Programmatic Control¶
CLI Flags¶
Command-line flags override configuration for that specific run:
# Core overrides
tenets distill "query" --max-tokens 50000
tenets distill "query" --format xml
tenets distill "query" --copy
# Ranking mode
tenets distill "query" --mode fast # Quick analysis
tenets distill "query" --mode thorough # Deep analysis
tenets distill "query" --mode ml # With embeddings
# File filtering
tenets distill "query" --include "*.py" --exclude "test_*.py"
tenets distill "query" --include-tests # Include test files
# Git control
tenets distill "query" --no-git # Disable git signals
# Session management
tenets distill "query" --session feature-x
# Content optimization
tenets distill "query" --condense # Aggressive compression
tenets distill "query" --remove-comments # Strip comments
tenets distill "query" --full # No summarization
Programmatic Configuration¶
Basic usage with custom config:
from tenets import Tenets
from tenets.config import TenetsConfig
# Create custom configuration
config = TenetsConfig(
max_tokens=150000,
ranking={
"algorithm": "thorough",
"threshold": 0.05,
"text_similarity_algorithm": "bm25", # or "tfidf" for TF-IDF
"use_embeddings": True,
"workers": 4,
"custom_weights": {
"keyword_match": 0.30,
"path_relevance": 0.25,
"git_activity": 0.20,
}
},
scanner={
"respect_gitignore": True,
"max_file_size": 10_000_000,
"exclude_tests_by_default": False,
},
output={
"default_format": "xml",
"copy_on_distill": True,
},
tenet={
"auto_instill": True,
"max_per_context": 10,
"system_instruction": "Focus on security and performance",
"system_instruction_enabled": True,
}
)
# Initialize with custom config
tenets = Tenets(config=config)
# Use it
result = tenets.distill(
"implement caching layer",
max_tokens=80000, # Override config for this call
mode="balanced", # Override algorithm
)
Load and modify existing config:
from tenets import Tenets
from tenets.config import TenetsConfig
# Load from file
config = TenetsConfig.from_file(".tenets.yml")
# Modify specific settings
config.ranking.algorithm = "fast"
config.ranking.threshold = 0.08
config.output.copy_on_distill = True
# Use modified config
tenets = Tenets(config=config)
Runtime overrides:
# Config precedence: method args > instance config > file config
result = tenets.distill(
prompt="add authentication",
mode="thorough", # Overrides config.ranking.algorithm
max_tokens=100000, # Overrides config.max_tokens
format="json", # Overrides config.output.default_format
session_name="auth", # Session-specific
include_patterns=["*.py", "*.js"],
exclude_patterns=["*.test.js"],
)
Configuration Recipes¶
For Different Use Cases¶
Large Monorepo (millions of files):
max_tokens: 150000
scanner:
max_files: 50000
workers: 8
parallel_mode: process
exclude_tests_by_default: true
ranking:
algorithm: fast
threshold: 0.15
workers: 4
batch_size: 500
cache:
enabled: true
memory_cache_size: 5000
Small Project (high precision):
max_tokens: 80000
ranking:
algorithm: thorough
threshold: 0.08
text_similarity_algorithm: bm25 # Default algorithm
use_embeddings: true
custom_weights:
keyword_match: 0.35
import_graph: 0.25
Documentation-Heavy Project:
summarizer:
docstring_weight: 0.8
include_all_signatures: true
preserve_code_structure: false
ranking:
custom_weights:
keyword_match: 0.20
path_relevance: 0.30 # Prioritize doc paths
Security-Focused Analysis:
tenet:
system_instruction: |
Focus on security implications.
Flag any potential vulnerabilities.
Suggest secure alternatives.
system_instruction_enabled: true
auto_instill: true
scanner:
additional_ignore_patterns: [] # Don't skip anything
exclude_tests_by_default: false
Performance Tuning¶
Maximum Speed (sacrifices precision):
ranking:
algorithm: fast
threshold: 0.05
text_similarity_algorithm: bm25 # Using BM25 (default)
use_embeddings: false
workers: 8
scanner:
workers: 8
timeout: 2.0
cache:
enabled: true
compression: false
Maximum Precision (slower):
ranking:
algorithm: thorough
threshold: 0.20
text_similarity_algorithm: bm25 # Default algorithm
use_embeddings: true
use_git: true
workers: 2
summarizer:
quality_threshold: high
enable_ml_strategies: true
Memory-Constrained Environment:
scanner:
max_files: 1000
workers: 1
ranking:
workers: 1
batch_size: 50
cache:
memory_cache_size: 100
max_size_mb: 100
nlp:
embeddings_batch_size: 8
multiprocessing_enabled: false
Common Workflows¶
Bug Investigation:
ranking:
algorithm: balanced
threshold: 0.10
custom_weights:
git_activity: 0.30 # Recent changes matter
complexity: 0.20 # Complex code = more bugs
git:
include_history: true
history_limit: 200
include_blame: true
New Feature Development:
ranking:
algorithm: balanced
threshold: 0.08
custom_weights:
import_graph: 0.30 # Dependencies matter
path_relevance: 0.25 # Related modules
output:
copy_on_distill: true
show_token_usage: true
Code Review Preparation:
summarizer:
target_ratio: 0.5 # More detail
preserve_code_structure: true
include_all_signatures: true
output:
syntax_highlighting: true
line_numbers: true
include_metadata: true
Troubleshooting¶
Common Issues and Solutions¶
No files included in context: - Lower ranking.threshold
(try 0.05) - Use --mode fast
for broader inclusion - Increase max_tokens
limit - Check if files match --include
patterns - Verify files aren't in .gitignore
- Use --include-tests
if analyzing test files
Configuration not taking effect:
# Check which config file is loaded
tenets config show | head -20
# Verify specific setting
tenets config show --key ranking.threshold
# Check config file location
ls -la .tenets.yml
# Test with explicit config
tenets --config ./my-config.yml distill "query"
Environment variables not working:
# Verify export (not just set)
export TENETS_RANKING_THRESHOLD=0.05 # Correct
TENETS_RANKING_THRESHOLD=0.05 # Wrong (not exported)
# Check if variable is set
echo $TENETS_RANKING_THRESHOLD
# Debug with explicit env
TENETS_DEBUG=true tenets config show
Performance issues: - Reduce scanner.max_files
and scanner.max_file_size
- Enable caching: cache.enabled: true
- Use ranking.algorithm: fast
- Reduce ranking.workers
if CPU-constrained - Exclude unnecessary paths with additional_ignore_patterns
Token limit exceeded: - Increase max_tokens
or use --max-tokens
- Enable --condense
flag - Use --remove-comments
- Increase ranking.threshold
for stricter filtering - Exclude test files: scanner.exclude_tests_by_default: true
Cache issues:
# Clear cache
rm -rf ~/.tenets/cache
# Disable cache temporarily
TENETS_CACHE_ENABLED=false tenets distill "query"
# Use custom cache location
export TENETS_CACHE_DIRECTORY=/tmp/tenets-cache
Validation Commands¶
# Validate configuration syntax
tenets config validate
# Show effective configuration
tenets config show --format json | jq
# Test configuration with dry run
tenets distill "test query" --dry-run
# Check what files would be scanned
tenets examine . --dry-run
# Debug ranking process
TENETS_DEBUG=true tenets distill "query" 2>debug.log
Advanced Topics¶
Custom Ranking Strategies¶
Create a custom ranking strategy by combining weights:
ranking:
algorithm: custom
custom_weights:
keyword_match: 0.40 # Emphasize keyword relevance
path_relevance: 0.15 # De-emphasize path matching
import_graph: 0.15 # Moderate dependency weight
git_activity: 0.10 # Low git signal weight
file_type: 0.10 # File type matching
complexity: 0.10 # Code complexity
Multi-Environment Setup¶
Create environment-specific configs:
# Development
cp .tenets.yml .tenets.dev.yml
# Edit for dev settings
# Production analysis
cp .tenets.yml .tenets.prod.yml
# Edit for production settings
# Use specific config
tenets --config .tenets.dev.yml distill "query"
Integration with CI/CD¶
# .tenets.ci.yml - Optimized for CI
max_tokens: 50000
quiet: true
scanner:
max_files: 5000
workers: 2
ranking:
algorithm: fast
threshold: 0.10
cache:
enabled: false # Fresh analysis each run
output:
default_format: json # Machine-readable
See Also¶
- CLI Reference - Complete command documentation
- API Reference - Python API documentation
- Architecture - System design details