Configuration Guide¶

Comprehensive guide to configuring Tenets for optimal code context building.

Overview¶

Tenets uses a hierarchical configuration system with multiple override levels:

Precedence (lowest → highest): 1. Default configuration (built-in) 2. Project file (.tenets.yml at repo root) 3. User file (~/.config/tenets/config.yml or ~/.tenets.yml) 4. Environment variables (TENETS_*) 5. CLI flags (--mode, --max-tokens, etc.) 6. Programmatic overrides (Tenets(config=...))

Inspect configuration:

Bash

tenets config show                # Full config
tenets config show --key ranking  # Specific section
tenets config show --format json  # JSON output

Files and locations¶

Tenets searches these locations in order and uses the first it finds: - ./.tenets.yml - ./.tenets.yaml - ./tenets.yml - ./.config/tenets.yml - ~/.config/tenets/config.yml - ~/.tenets.yml

Create a starter file:

tenets config init # writes .tenets.yml in the current directory

Complete Configuration Schema¶

All available configuration sections and their options:

YAML

# ============= Core Settings =============
max_tokens: 100000          # Maximum tokens for context (default: 100000)
debug: false                # Enable debug logging
quiet: false                # Suppress non-essential output

# ============= File Scanning =============
scanner:
  respect_gitignore: true          # Honor .gitignore patterns
  follow_symlinks: false           # Follow symbolic links
  max_file_size: 5000000          # Max file size in bytes (5MB)
  max_files: 10000                # Maximum files to scan
  binary_check: true              # Skip binary files
  encoding: utf-8                 # File encoding
  workers: 4                      # Parallel scanning workers
  parallel_mode: auto             # auto | thread | process
  timeout: 5.0                    # Timeout per file (seconds)
  exclude_minified: true          # Skip minified files
  exclude_tests_by_default: true  # Skip test files unless explicit

  # Ignore patterns (in addition to .gitignore)
  additional_ignore_patterns:
    - '*.generated.*'
    - vendor/
    - node_modules/
    - '*.egg-info/'
    - __pycache__/
    - .pytest_cache/

  # Test file patterns
  test_patterns:
    - test_*.py
    - '*_test.py'
    - '*.test.js'
    - '*.spec.ts'

  # Test directories
  test_directories:
    - test
    - tests
    - __tests__
    - spec

# ============= Ranking System =============
ranking:
  algorithm: balanced             # fast | balanced | thorough | ml | custom
  threshold: 0.10                 # 0.0-1.0 (lower includes more files)
  text_similarity_algorithm: bm25 # bm25 (default) | tfidf (optional)
  text_similarity_algorithm: bm25  # Using BM25 (default)               # Deprecated - use text_similarity_algorithm instead
  use_stopwords: false           # Filter common tokens
  use_embeddings: false          # Semantic similarity (requires ML)
  use_git: true                  # Include git signals
  use_ml: false                  # Machine learning features
  embedding_model: all-MiniLM-L6-v2  # Embedding model name
  workers: 2                     # Parallel ranking workers
  parallel_mode: auto            # thread | process | auto
  batch_size: 100               # Files per batch

  # Custom factor weights (0.0-1.0)
  custom_weights:
    keyword_match: 0.25
    path_relevance: 0.20
    import_graph: 0.20
    git_activity: 0.15
    file_type: 0.10
    complexity: 0.10

# ============= Summarization =============
summarizer:
  default_mode: auto             # auto | extractive | abstractive
  target_ratio: 0.3              # Target compression ratio
  enable_cache: true             # Cache summaries
  preserve_code_structure: true  # Keep code structure intact
  summarize_imports: true        # Condense import statements
  import_summary_threshold: 5    # Min imports to trigger summary
  max_cache_size: 100           # Max cached summaries
  quality_threshold: medium      # low | medium | high
  batch_size: 10                # Files per batch
  docstring_weight: 0.5         # Weight for docstrings
  include_all_signatures: true   # Include all function signatures

  # LLM settings (optional)
  llm_provider: null            # openai | anthropic | null
  llm_model: null               # Model name
  llm_temperature: 0.3          # Creativity (0.0-1.0)
  llm_max_tokens: 500           # Max tokens per summary
  enable_ml_strategies: false    # Use ML summarization

# ============= Tenet System =============
tenet:
  auto_instill: true              # Auto-apply tenets
  max_per_context: 5              # Max tenets per context
  reinforcement: true             # Reinforce important tenets
  injection_strategy: strategic   # strategic | sequential | random
  min_distance_between: 1000      # Min chars between injections
  prefer_natural_breaks: true     # Insert at natural boundaries
  storage_path: ~/.tenets/tenets  # Tenet storage location
  collections_enabled: true       # Enable tenet collections

  # Injection frequency
  injection_frequency: adaptive   # always | periodic | adaptive | manual
  injection_interval: 3           # For periodic mode
  session_complexity_threshold: 0.7  # Triggers adaptive injection
  min_session_length: 5           # Min prompts before injection

  # Advanced settings
  adaptive_injection: true        # Smart injection timing
  track_injection_history: true   # Track what was injected
  decay_rate: 0.1                # How fast tenets decay
  reinforcement_interval: 10      # Reinforce every N prompts
  session_aware: true            # Use session context
  session_memory_limit: 100      # Max session history
  persist_session_history: true   # Save session data

  # Priority settings
  priority_boost_critical: 2.0    # Boost for critical tenets
  priority_boost_high: 1.5       # Boost for high priority
  skip_low_priority_on_complex: true  # Skip low priority when complex

  # System instruction
  system_instruction: null        # Global system instruction
  system_instruction_enabled: false  # Enable system instruction
  system_instruction_position: top   # top | bottom
  system_instruction_format: markdown  # markdown | plain
  system_instruction_once_per_session: true  # Inject once per session

# ============= Caching =============
cache:
  enabled: true                  # Enable caching
  directory: ~/.tenets/cache     # Cache directory
  ttl_days: 7                   # Time to live (days)
  max_size_mb: 500              # Max cache size (MB)
  compression: false            # Compress cache data
  memory_cache_size: 1000       # In-memory cache entries
  max_age_hours: 24            # Max cache age (hours)

  # SQLite settings
  sqlite_pragmas:
    journal_mode: WAL
    synchronous: NORMAL
    cache_size: '-64000'
    temp_store: MEMORY

  # LLM cache
  llm_cache_enabled: true       # Cache LLM responses
  llm_cache_ttl_hours: 24      # LLM cache TTL

# ============= Output Formatting =============
output:
  default_format: markdown       # markdown | xml | json | html
  syntax_highlighting: true      # Enable syntax highlighting
  line_numbers: false           # Show line numbers
  max_line_length: 120          # Max line length
  include_metadata: true        # Include metadata
  compression_threshold: 10000  # Compress if larger (chars)
  summary_ratio: 0.25           # Summary compression ratio
  copy_on_distill: false        # Auto-copy to clipboard
  show_token_usage: true        # Show token counts
  show_cost_estimate: true      # Show LLM cost estimates

# ============= Git Integration =============
git:
  enabled: true                 # Use git information
  include_history: true         # Include commit history
  history_limit: 100           # Max commits to analyze
  include_blame: false         # Include git blame
  include_stats: true          # Include statistics

  # Ignore these authors
  ignore_authors:
    - dependabot[bot]
    - github-actions[bot]
    - renovate[bot]

  # Main branch names
  main_branches:
    - main
    - master
    - develop
    - trunk

# ============= NLP Settings =============
nlp:
  enabled: true                    # Enable NLP features
  stopwords_enabled: true          # Use stopwords
  code_stopword_set: minimal       # minimal | standard | aggressive
  prompt_stopword_set: aggressive  # minimal | standard | aggressive
  custom_stopword_files: []        # Custom stopword files

  # Tokenization
  tokenization_mode: auto          # auto | simple | advanced
  preserve_original_tokens: true   # Keep original tokens
  split_camelcase: true           # Split CamelCase
  split_snakecase: true           # Split snake_case
  min_token_length: 2             # Min token length

  # Keyword extraction
  keyword_extraction_method: auto  # auto | rake | yake | bm25 | tfidf
  max_keywords: 30                # Max keywords to extract
  ngram_size: 3                  # N-gram size
  yake_dedup_threshold: 0.7      # YAKE deduplication

  # BM25 settings
  bm25_k1: 1.2                   # Term frequency saturation parameter
  bm25_b: 0.75                   # Length normalization parameter

  # TF-IDF settings (when explicitly configured as alternative to BM25)
  tfidf_use_sublinear: true      # Sublinear TF scaling (only when TF-IDF is used)
  tfidf_use_idf: true           # Use IDF
  tfidf_norm: l2                # Normalization

  # Embeddings
  embeddings_enabled: false       # Enable embeddings
  embeddings_model: all-MiniLM-L6-v2  # Model name
  embeddings_device: auto        # cpu | cuda | auto
  embeddings_cache: true         # Cache embeddings
  embeddings_batch_size: 32      # Batch size
  similarity_metric: cosine      # cosine | euclidean | manhattan
  similarity_threshold: 0.7      # Similarity threshold

  # Cache settings
  cache_embeddings_ttl_days: 30  # Embeddings cache TTL
  cache_tfidf_ttl_days: 7       # BM25/TF-IDF cache TTL
  cache_keywords_ttl_days: 7     # Keywords cache TTL

  # Performance
  multiprocessing_enabled: true   # Use multiprocessing
  multiprocessing_workers: null   # null = auto-detect
  multiprocessing_chunk_size: 100 # Chunk size

# ============= LLM Settings (Optional) =============
llm:
  enabled: false                # Enable LLM features
  provider: openai              # openai | anthropic | ollama
  fallback_providers:           # Fallback providers
    - anthropic
    - openrouter

  # API keys (use environment variables)
  api_keys:
    openai: ${OPENAI_API_KEY}
    anthropic: ${ANTHROPIC_API_KEY}
    openrouter: ${OPENROUTER_API_KEY}

  # API endpoints
  api_base_urls:
    openai: https://api.openai.com/v1
    anthropic: https://api.anthropic.com/v1
    openrouter: https://openrouter.ai/api/v1
    ollama: http://localhost:11434

  # Model selection
  models:
    default: gpt-4o-mini
    summarization: gpt-3.5-turbo
    analysis: gpt-4o
    embeddings: text-embedding-3-small
    code_generation: gpt-4o

  # Rate limits and costs
  max_cost_per_run: 0.1         # Max $ per run
  max_cost_per_day: 10.0        # Max $ per day
  max_tokens_per_request: 4000   # Max tokens per request
  max_context_length: 100000     # Max context length

  # Generation settings
  temperature: 0.3              # Creativity (0.0-1.0)
  top_p: 0.95                  # Nucleus sampling
  frequency_penalty: 0.0        # Frequency penalty
  presence_penalty: 0.0         # Presence penalty

  # Network settings
  requests_per_minute: 60       # Rate limit
  retry_on_error: true         # Retry failed requests
  max_retries: 3              # Max retry attempts
  retry_delay: 1.0            # Initial retry delay
  retry_backoff: 2.0          # Backoff multiplier
  timeout: 30                 # Request timeout (seconds)
  stream: false               # Stream responses

  # Logging and caching
  cache_responses: true        # Cache LLM responses
  cache_ttl_hours: 24         # Cache TTL (hours)
  log_requests: false         # Log requests
  log_responses: false        # Log responses

# ============= Custom Settings =============
custom: {}  # User-defined custom settings

Key Configuration Notes¶

Ranking: - threshold: Lower values (0.05-0.10) include more files, higher (0.20-0.30) for stricter matching - algorithm: - fast: Quick keyword matching (~10ms/file) - balanced: Structural analysis + BM25 (default) - thorough: Full analysis with relationships - ml: Machine learning with embeddings (requires extras) - custom_weights: Fine-tune ranking factors (values 0.0-1.0)

Scanner: - respect_gitignore: Always honors .gitignore patterns - exclude_tests_by_default: Tests excluded unless --include-tests used - additional_ignore_patterns: Added to built-in patterns

Tenet System: - auto_instill: Automatically applies relevant tenets to context - injection_frequency: - always: Every distill - periodic: Every N distills - adaptive: Based on complexity - manual: Only when explicitly called - system_instruction: Global instruction added to all contexts

Output: - copy_on_distill: Auto-copy result to clipboard - default_format: Default output format (markdown recommended for LLMs)

Performance: - workers: More workers = faster but more CPU/memory - cache.enabled: Significantly speeds up repeated operations - ranking.batch_size: Larger batches = more memory but faster

Environment Variable Overrides¶

Any configuration option can be overridden via environment variables.

Format: - Nested keys: TENETS_<SECTION>_<KEY>=value - Top-level keys: TENETS_<KEY>=value - Lists: Comma-separated values - Booleans: true or false (case-insensitive)

Common Examples:

Bash

# Core settings
export TENETS_MAX_TOKENS=150000
export TENETS_DEBUG=true
export TENETS_QUIET=false

# Ranking configuration
export TENETS_RANKING_ALGORITHM=thorough
export TENETS_RANKING_THRESHOLD=0.05
export TENETS_RANKING_TEXT_SIMILARITY_ALGORITHM=tfidf  # Use TF-IDF instead of BM25
export TENETS_RANKING_USE_EMBEDDINGS=true
export TENETS_RANKING_WORKERS=4

# Scanner settings
export TENETS_SCANNER_MAX_FILE_SIZE=10000000
export TENETS_SCANNER_RESPECT_GITIGNORE=true
export TENETS_SCANNER_EXCLUDE_TESTS_BY_DEFAULT=false

# Output settings
export TENETS_OUTPUT_DEFAULT_FORMAT=xml
export TENETS_OUTPUT_COPY_ON_DISTILL=true
export TENETS_OUTPUT_SHOW_TOKEN_USAGE=false

# Cache settings
export TENETS_CACHE_ENABLED=false
export TENETS_CACHE_DIRECTORY=/tmp/tenets-cache
export TENETS_CACHE_TTL_DAYS=14

# Git settings
export TENETS_GIT_ENABLED=false
export TENETS_GIT_HISTORY_LIMIT=50

# Tenet system
export TENETS_TENET_AUTO_INSTILL=false
export TENETS_TENET_MAX_PER_CONTEXT=10
export TENETS_TENET_INJECTION_FREQUENCY=periodic
export TENETS_TENET_INJECTION_INTERVAL=5

# System instruction
export TENETS_TENET_SYSTEM_INSTRUCTION="You are a senior engineer. Focus on security and performance."
export TENETS_TENET_SYSTEM_INSTRUCTION_ENABLED=true

Usage Patterns:

Bash

# One-time override
TENETS_RANKING_ALGORITHM=fast tenets distill "fix bug"

# Session-wide settings
export TENETS_RANKING_THRESHOLD=0.05
export TENETS_OUTPUT_COPY_ON_DISTILL=true
tenets distill "implement feature"  # Uses exported settings

# Verify configuration
tenets config show --key ranking
tenets config show --format json | jq '.ranking'

CLI Flags and Programmatic Control¶

CLI Flags¶

Command-line flags override configuration for that specific run:

Bash

# Core overrides
tenets distill "query" --max-tokens 50000
tenets distill "query" --format xml
tenets distill "query" --copy

# Ranking mode
tenets distill "query" --mode fast      # Quick analysis
tenets distill "query" --mode thorough  # Deep analysis
tenets distill "query" --mode ml        # With embeddings

# File filtering
tenets distill "query" --include "*.py" --exclude "test_*.py"
tenets distill "query" --include-tests  # Include test files

# Git control
tenets distill "query" --no-git  # Disable git signals

# Session management
tenets distill "query" --session feature-x

# Content optimization
tenets distill "query" --condense        # Aggressive compression
tenets distill "query" --remove-comments # Strip comments
tenets distill "query" --full            # No summarization

Programmatic Configuration¶

Basic usage with custom config:

Python

id="__codelineno-5-1" name="__codelineno-5-1">from tenets import Tenets class="kn">from tenets.config import TenetsConfig class="c1"># Create custom configuration class="n">config = TenetsConfig( max_tokens=150000, ranking={ "algorithm": "thorough", "threshold": 0.05, "text_similarity_algorithm": "bm25", # or "tfidf" for TF-IDF "use_embeddings": True, "workers": 4, "custom_weights": { "keyword_match": 0.30, "path_relevance": 0.25, "git_activity": 0.20, } }, scanner={ "respect_gitignore": True, "max_file_size": 10_000_000, "exclude_tests_by_default": False, }, output={ "default_format": "xml", "copy_on_distill": True, }, tenet={ "auto_instill": True, "max_per_context": 10, "system_instruction": "Focus on security and performance", "system_instruction_enabled": True, } class="p">) class="c1"># Initialize with custom config class="n">tenets = Tenets(config=config) class="c1"># Use it class="n">result = tenets.distill( "implement caching layer", max_tokens=80000, # Override config for this call mode="balanced", # Override algorithm class="p">)

Load and modify existing config:

Python

from tenets import Tenets
from tenets.config import TenetsConfig

# Load from file
config = TenetsConfig.from_file(".tenets.yml")

# Modify specific settings
config.ranking.algorithm = "fast"
config.ranking.threshold = 0.08
config.output.copy_on_distill = True

# Use modified config
tenets = Tenets(config=config)

Runtime overrides:

Python

# Config precedence: method args > instance config > file config
result = tenets.distill(
    prompt="add authentication",
    mode="thorough",        # Overrides config.ranking.algorithm
    max_tokens=100000,      # Overrides config.max_tokens
    format="json",          # Overrides config.output.default_format
    session_name="auth",    # Session-specific
    include_patterns=["*.py", "*.js"],
    exclude_patterns=["*.test.js"],
)

Configuration Recipes¶

For Different Use Cases¶

Large Monorepo (millions of files):

YAML

max_tokens: 150000
scanner:
  max_files: 50000
  workers: 8
  parallel_mode: process
  exclude_tests_by_default: true
ranking:
  algorithm: fast
  threshold: 0.15
  workers: 4
  batch_size: 500
cache:
  enabled: true
  memory_cache_size: 5000

Small Project (high precision):

YAML

max_tokens: 80000
ranking:
  algorithm: thorough
  threshold: 0.08
  text_similarity_algorithm: bm25  # Default algorithm
  use_embeddings: true
  custom_weights:
    keyword_match: 0.35
    import_graph: 0.25

Documentation-Heavy Project:

YAML

summarizer:
  docstring_weight: 0.8
  include_all_signatures: true
  preserve_code_structure: false
ranking:
  custom_weights:
    keyword_match: 0.20
    path_relevance: 0.30  # Prioritize doc paths

Security-Focused Analysis:

YAML

tenet:
  system_instruction: |
    Focus on security implications.
    Flag any potential vulnerabilities.
    Suggest secure alternatives.
  system_instruction_enabled: true
  auto_instill: true
scanner:
  additional_ignore_patterns: []  # Don't skip anything
  exclude_tests_by_default: false

Performance Tuning¶

Maximum Speed (sacrifices precision):

YAML

ranking:
  algorithm: fast
  threshold: 0.05
  text_similarity_algorithm: bm25  # Using BM25 (default)
  use_embeddings: false
  workers: 8
scanner:
  workers: 8
  timeout: 2.0
cache:
  enabled: true
  compression: false

Maximum Precision (slower):

YAML

ranking:
  algorithm: thorough
  threshold: 0.20
  text_similarity_algorithm: bm25  # Default algorithm
  use_embeddings: true
  use_git: true
  workers: 2
summarizer:
  quality_threshold: high
  enable_ml_strategies: true

Memory-Constrained Environment:

YAML

scanner:
  max_files: 1000
  workers: 1
ranking:
  workers: 1
  batch_size: 50
cache:
  memory_cache_size: 100
  max_size_mb: 100
nlp:
  embeddings_batch_size: 8
  multiprocessing_enabled: false

Common Workflows¶

Bug Investigation:

YAML

ranking:
  algorithm: balanced
  threshold: 0.10
  custom_weights:
    git_activity: 0.30  # Recent changes matter
    complexity: 0.20    # Complex code = more bugs
git:
  include_history: true
  history_limit: 200
  include_blame: true

New Feature Development:

YAML

ranking:
  algorithm: balanced
  threshold: 0.08
  custom_weights:
    import_graph: 0.30  # Dependencies matter
    path_relevance: 0.25 # Related modules
output:
  copy_on_distill: true
  show_token_usage: true

Code Review Preparation:

YAML

summarizer:
  target_ratio: 0.5  # More detail
  preserve_code_structure: true
  include_all_signatures: true
output:
  syntax_highlighting: true
  line_numbers: true
  include_metadata: true

Troubleshooting¶

Common Issues and Solutions¶

No files included in context: - Lower ranking.threshold (try 0.05) - Use --mode fast for broader inclusion - Increase max_tokens limit - Check if files match --include patterns - Verify files aren't in .gitignore - Use --include-tests if analyzing test files

Configuration not taking effect:

Bash

# Check which config file is loaded
tenets config show | head -20

# Verify specific setting
tenets config show --key ranking.threshold

# Check config file location
ls -la .tenets.yml

# Test with explicit config
tenets --config ./my-config.yml distill "query"

Environment variables not working:

Bash

# Verify export (not just set)
export TENETS_RANKING_THRESHOLD=0.05  # Correct
TENETS_RANKING_THRESHOLD=0.05         # Wrong (not exported)

# Check if variable is set
echo $TENETS_RANKING_THRESHOLD

# Debug with explicit env
TENETS_DEBUG=true tenets config show

Performance issues: - Reduce scanner.max_files and scanner.max_file_size - Enable caching: cache.enabled: true - Use ranking.algorithm: fast - Reduce ranking.workers if CPU-constrained - Exclude unnecessary paths with additional_ignore_patterns

Token limit exceeded: - Increase max_tokens or use --max-tokens - Enable --condense flag - Use --remove-comments - Increase ranking.threshold for stricter filtering - Exclude test files: scanner.exclude_tests_by_default: true

Cache issues:

Bash

# Clear cache
rm -rf ~/.tenets/cache

# Disable cache temporarily
TENETS_CACHE_ENABLED=false tenets distill "query"

# Use custom cache location
export TENETS_CACHE_DIRECTORY=/tmp/tenets-cache

Validation Commands¶

Bash

# Validate configuration syntax
tenets config validate

# Show effective configuration
tenets config show --format json | jq

# Test configuration with dry run
tenets distill "test query" --dry-run

# Check what files would be scanned
tenets examine . --dry-run

# Debug ranking process
TENETS_DEBUG=true tenets distill "query" 2>debug.log

Advanced Topics¶

Custom Ranking Strategies¶

Create a custom ranking strategy by combining weights:

YAML

ranking:
  algorithm: custom
  custom_weights:
    keyword_match: 0.40    # Emphasize keyword relevance
    path_relevance: 0.15   # De-emphasize path matching
    import_graph: 0.15     # Moderate dependency weight
    git_activity: 0.10     # Low git signal weight
    file_type: 0.10        # File type matching
    complexity: 0.10       # Code complexity

Multi-Environment Setup¶

Create environment-specific configs:

Bash

# Development
cp .tenets.yml .tenets.dev.yml
# Edit for dev settings

# Production analysis
cp .tenets.yml .tenets.prod.yml
# Edit for production settings

# Use specific config
tenets --config .tenets.dev.yml distill "query"

Integration with CI/CD¶

YAML

# .tenets.ci.yml - Optimized for CI
max_tokens: 50000
quiet: true
scanner:
  max_files: 5000
  workers: 2
ranking:
  algorithm: fast
  threshold: 0.10
cache:
  enabled: false  # Fresh analysis each run
output:
  default_format: json  # Machine-readable

Configuration Guide¶

Overview¶

Files and locations¶

Complete Configuration Schema¶

Key Configuration Notes¶

Environment Variable Overrides¶

CLI Flags and Programmatic Control¶

CLI Flags¶

Programmatic Configuration¶

Configuration Recipes¶

For Different Use Cases¶

Performance Tuning¶

Common Workflows¶

Troubleshooting¶

Common Issues and Solutions¶

Validation Commands¶

Advanced Topics¶

Custom Ranking Strategies¶

Multi-Environment Setup¶

Integration with CI/CD¶

See Also¶