ranker
¶
Full name: tenets.core.ranking.ranker
ranker¶
Main relevance ranking orchestrator.
This module provides the main RelevanceRanker class that coordinates different ranking strategies, manages corpus analysis, and produces ranked results. It supports multiple algorithms, parallel processing, and custom ranking extensions.
The ranker is designed to be efficient, scalable, and extensible while providing high-quality relevance scoring for code search and context generation.
Classes¶
RankingAlgorithm¶
Bases: Enum
Available ranking algorithms.
Each algorithm provides different trade-offs between speed and accuracy.
RankingStatsdataclass
¶
RankingStats(total_files: int = 0, files_ranked: int = 0, files_failed: int = 0, time_elapsed: float = 0.0, algorithm_used: str = '', threshold_applied: float = 0.0, files_above_threshold: int = 0, average_score: float = 0.0, max_score: float = 0.0, min_score: float = 0.0, corpus_stats: Dict[str, Any] = None)
Statistics from ranking operation.
Tracks performance metrics and diagnostic information about the ranking process for monitoring and optimization.
ATTRIBUTE | DESCRIPTION |
---|---|
total_files | Total number of files processed TYPE: |
files_ranked | Number of files successfully ranked TYPE: |
files_failed | Number of files that failed ranking TYPE: |
time_elapsed | Total time in seconds TYPE: |
algorithm_used | Which algorithm was used TYPE: |
threshold_applied | Relevance threshold used TYPE: |
files_above_threshold | Number of files above threshold TYPE: |
average_score | Average relevance score TYPE: |
max_score | Maximum relevance score TYPE: |
min_score | Minimum relevance score TYPE: |
corpus_stats | Dictionary of corpus statistics |
Functions¶
to_dict¶
Convert to dictionary representation.
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any] | Dictionary with all statistics |
Source code in tenets/core/ranking/ranker.py
def to_dict(self) -> Dict[str, Any]:
"""Convert to dictionary representation.
Returns:
Dictionary with all statistics
"""
return {
"total_files": self.total_files,
"files_ranked": self.files_ranked,
"files_failed": self.files_failed,
"time_elapsed": self.time_elapsed,
"algorithm_used": self.algorithm_used,
"threshold_applied": self.threshold_applied,
"files_above_threshold": self.files_above_threshold,
"average_score": self.average_score,
"max_score": self.max_score,
"min_score": self.min_score,
"corpus_stats": self.corpus_stats,
}
RelevanceRanker¶
RelevanceRanker(config: TenetsConfig, algorithm: Optional[str] = None, use_stopwords: Optional[bool] = None)
Main relevance ranking system.
Orchestrates the ranking process by analyzing the corpus, selecting appropriate strategies, and producing ranked results. Supports multiple algorithms, parallel processing, and custom ranking extensions.
The ranker follows a multi-stage process: 1. Corpus analysis (TF-IDF, import graph, statistics) 2. Strategy selection based on algorithm 3. Parallel factor calculation 4. Score aggregation and weighting 5. Filtering and sorting
ATTRIBUTE | DESCRIPTION |
---|---|
config | TenetsConfig instance |
logger | Logger instance |
strategies | Available ranking strategies |
custom_rankers | Custom ranking functions |
executor | Thread pool for parallel processing |
stats | Latest ranking statistics |
cache | Internal cache for optimizations |
Initialize the relevance ranker.
PARAMETER | DESCRIPTION |
---|---|
config | Tenets configuration TYPE: |
algorithm | Override default algorithm |
use_stopwords | Override stopword filtering setting |
Source code in tenets/core/ranking/ranker.py
def __init__(
self,
config: TenetsConfig,
algorithm: Optional[str] = None,
use_stopwords: Optional[bool] = None,
):
"""Initialize the relevance ranker.
Args:
config: Tenets configuration
algorithm: Override default algorithm
use_stopwords: Override stopword filtering setting
"""
self.config = config
self.logger = get_logger(__name__)
# Determine algorithm
algo_str = algorithm or config.ranking.algorithm
try:
self.algorithm = RankingAlgorithm(algo_str)
except ValueError:
self.logger.warning(f"Unknown algorithm '{algo_str}', using balanced")
self.algorithm = RankingAlgorithm.BALANCED
# Stopword configuration
self.use_stopwords = (
use_stopwords if use_stopwords is not None else config.ranking.use_stopwords
)
# ML configuration
self.use_ml = (
config.ranking.use_ml if config and hasattr(config.ranking, "use_ml") else False
)
# Initialize strategies lazily to avoid loading unnecessary models
self._strategies_cache: Dict[RankingAlgorithm, RankingStrategy] = {}
self.strategies = self._strategies_cache # Alias for compatibility
# Pre-populate core strategies for tests that expect them
# These are lightweight and don't load ML models until actually used
self._init_core_strategies()
# Custom rankers list (keep public and test-expected private alias)
self.custom_rankers: List[Callable] = []
self._custom_rankers: List[Callable] = self.custom_rankers
# Thread pool for parallel ranking (lazy initialization to avoid Windows issues)
from tenets.utils.multiprocessing import get_ranking_workers, log_worker_info
max_workers = get_ranking_workers(config)
self.max_workers = max_workers # Store for logging
self._executor_instance = None # Will be created lazily
# Backwards-compat alias expected by some tests
self._executor = None
# Statistics and cache
self.stats = RankingStats()
self.cache = {}
# ML model (loaded lazily)
self._ml_model = None
# Optional ML embedding model placeholder for tests that patch it
# Also expose module-level symbol on instance for convenience
self.SentenceTransformer = SentenceTransformer
# Log worker configuration
log_worker_info(self.logger, "RelevanceRanker", max_workers)
self.logger.info(
f"RelevanceRanker initialized: algorithm={self.algorithm.value}, "
f"use_stopwords={self.use_stopwords}, use_ml={self.use_ml}"
)
Attributes¶
executorproperty
¶
Lazy initialization of ThreadPoolExecutor to avoid Windows import issues.
Functions¶
rank_files¶
rank_files(files: List[FileAnalysis], prompt_context: PromptContext, algorithm: Optional[str] = None, parallel: bool = True, explain: bool = False) -> List[FileAnalysis]
Rank files by relevance to prompt.
This is the main entry point for ranking files. It analyzes the corpus, applies the selected ranking strategy, and returns files sorted by relevance above the configured threshold.
PARAMETER | DESCRIPTION |
---|---|
files | List of files to rank TYPE: |
prompt_context | Parsed prompt information TYPE: |
algorithm | Override algorithm for this ranking |
parallel | Whether to rank files in parallel TYPE: |
explain | Whether to generate ranking explanations TYPE: |
RETURNS | DESCRIPTION |
---|---|
List[FileAnalysis] | List of FileAnalysis objects sorted by relevance (highest first) |
List[FileAnalysis] | and filtered by threshold |
RAISES | DESCRIPTION |
---|---|
ValueError | If algorithm is invalid |
Source code in tenets/core/ranking/ranker.py
def rank_files(
self,
files: List[FileAnalysis],
prompt_context: PromptContext,
algorithm: Optional[str] = None,
parallel: bool = True,
explain: bool = False,
) -> List[FileAnalysis]:
"""Rank files by relevance to prompt.
This is the main entry point for ranking files. It analyzes the corpus,
applies the selected ranking strategy, and returns files sorted by
relevance above the configured threshold.
Args:
files: List of files to rank
prompt_context: Parsed prompt information
algorithm: Override algorithm for this ranking
parallel: Whether to rank files in parallel
explain: Whether to generate ranking explanations
Returns:
List of FileAnalysis objects sorted by relevance (highest first)
and filtered by threshold
Raises:
ValueError: If algorithm is invalid
"""
if not files:
return []
start_time = time.time()
# Reset statistics
self.stats = RankingStats(
total_files=len(files),
algorithm_used=algorithm or self.algorithm.value,
threshold_applied=self.config.ranking.threshold,
)
# Check if we need to disable parallel on Windows Python 3.13+
import sys
if sys.platform == "win32" and sys.version_info >= (3, 13) and parallel:
self.logger.warning(
"Disabling parallel ranking on Windows with Python 3.13+ due to compatibility issues"
)
parallel = False
self.logger.info(
f"Ranking {len(files)} files using {self.stats.algorithm_used} algorithm "
f"(parallel={parallel}, workers={self.max_workers if parallel else 1})"
)
# Select strategy
if algorithm:
try:
strategy = self._get_strategy(algorithm)
except ValueError:
raise ValueError(f"Unknown ranking algorithm: {algorithm}")
else:
strategy = self._get_strategy(self.algorithm.value)
if not strategy:
raise ValueError(f"No strategy for algorithm: {self.algorithm}")
# Analyze corpus
corpus_stats = self._analyze_corpus(files, prompt_context)
self.stats.corpus_stats = corpus_stats
# Rank files
ranked_files = self._rank_with_strategy(
files, prompt_context, corpus_stats, strategy, parallel
)
# Apply custom rankers
for custom_ranker in self.custom_rankers:
try:
ranked_files = custom_ranker(ranked_files, prompt_context)
except Exception as e:
self.logger.warning(f"Custom ranker failed: {e}")
# Sort by score
ranked_files.sort(reverse=True)
# Filter by threshold and update statistics
threshold = self.config.ranking.threshold
filtered_files = []
scores = []
for i, rf in enumerate(ranked_files):
scores.append(rf.score)
if rf.score >= threshold:
# Update FileAnalysis with ranking info
rf.analysis.relevance_score = rf.score
rf.analysis.relevance_rank = i + 1
# Generate explanation if requested
if explain:
rf.explanation = rf.generate_explanation(strategy.get_weights(), verbose=True)
filtered_files.append(rf.analysis)
# Update statistics
self.stats.files_ranked = len(ranked_files)
self.stats.files_above_threshold = len(filtered_files)
self.stats.time_elapsed = time.time() - start_time
if scores:
self.stats.average_score = sum(scores) / len(scores)
self.stats.max_score = max(scores)
self.stats.min_score = min(scores)
# If nothing passed threshold, fall back to returning top 1-3 files
if not filtered_files and ranked_files:
top_k = min(3, len(ranked_files))
fallback = [rf.analysis for rf in ranked_files[:top_k]]
for i, a in enumerate(fallback, 1):
a.relevance_score = ranked_files[i - 1].score
a.relevance_rank = i
filtered_files = fallback
self.logger.info(
f"Ranking complete: {len(filtered_files)}/{len(files)} files "
f"above threshold ({threshold:.2f}) in {self.stats.time_elapsed:.2f}s"
)
# Generate explanation report if requested
if explain and ranked_files:
explainer = RankingExplainer()
explanation = explainer.explain_ranking(ranked_files[:20], strategy.get_weights())
self.logger.info(f"Ranking Explanation:\n{explanation}")
return filtered_files
register_custom_ranker¶
register_custom_ranker(ranker_func: Callable[[List[RankedFile], PromptContext], List[RankedFile]])
Register a custom ranking function.
Custom rankers are applied after the main ranking strategy and can adjust scores based on project-specific logic.
PARAMETER | DESCRIPTION |
---|---|
ranker_func | Function that takes ranked files and returns modified list TYPE: |
Example
def boost_tests(ranked_files, prompt_context): ... if 'test' in prompt_context.text: ... for rf in ranked_files: ... if 'test' in rf.path: ... rf.score *= 1.5 ... return ranked_files ranker.register_custom_ranker(boost_tests)
Source code in tenets/core/ranking/ranker.py
def register_custom_ranker(
self, ranker_func: Callable[[List[RankedFile], PromptContext], List[RankedFile]]
):
"""Register a custom ranking function.
Custom rankers are applied after the main ranking strategy and can
adjust scores based on project-specific logic.
Args:
ranker_func: Function that takes ranked files and returns modified list
Example:
>>> def boost_tests(ranked_files, prompt_context):
... if 'test' in prompt_context.text:
... for rf in ranked_files:
... if 'test' in rf.path:
... rf.score *= 1.5
... return ranked_files
>>> ranker.register_custom_ranker(boost_tests)
"""
self.custom_rankers.append(ranker_func)
# Keep alias updated
self._custom_rankers = self.custom_rankers
self.logger.info(f"Registered custom ranker: {ranker_func.__name__}")
get_ranking_explanation¶
Get detailed explanation of ranking results.
PARAMETER | DESCRIPTION |
---|---|
ranked_files | List of ranked files TYPE: |
top_n | Number of top files to explain TYPE: |
RETURNS | DESCRIPTION |
---|---|
str | Formatted explanation string |
Source code in tenets/core/ranking/ranker.py
def get_ranking_explanation(self, ranked_files: List[RankedFile], top_n: int = 10) -> str:
"""Get detailed explanation of ranking results.
Args:
ranked_files: List of ranked files
top_n: Number of top files to explain
Returns:
Formatted explanation string
"""
explainer = RankingExplainer()
strategy = self.strategies.get(self.algorithm)
weights = strategy.get_weights() if strategy else {}
return explainer.explain_ranking(ranked_files[:top_n], weights, top_n=top_n)
get_stats¶
Get latest ranking statistics.
RETURNS | DESCRIPTION |
---|---|
RankingStats | RankingStats object |
shutdown¶
Shutdown the ranker and clean up resources.
Functions¶
create_ranker¶
create_ranker(config: Optional[TenetsConfig] = None, algorithm: str = 'balanced', use_stopwords: bool = False) -> RelevanceRanker
Create a configured relevance ranker.
PARAMETER | DESCRIPTION |
---|---|
config | Configuration (uses default if None) TYPE: |
algorithm | Ranking algorithm to use TYPE: |
use_stopwords | Whether to filter stopwords TYPE: |
RETURNS | DESCRIPTION |
---|---|
RelevanceRanker | Configured RelevanceRanker instance |
Source code in tenets/core/ranking/ranker.py
def create_ranker(
config: Optional[TenetsConfig] = None, algorithm: str = "balanced", use_stopwords: bool = False
) -> RelevanceRanker:
"""Create a configured relevance ranker.
Args:
config: Configuration (uses default if None)
algorithm: Ranking algorithm to use
use_stopwords: Whether to filter stopwords
Returns:
Configured RelevanceRanker instance
"""
if config is None:
config = TenetsConfig()
return RelevanceRanker(config, algorithm=algorithm, use_stopwords=use_stopwords)