Skip to content

ranker

Full name: tenets.core.ranking.ranker

ranker

Main relevance ranking orchestrator.

This module provides the main RelevanceRanker class that coordinates different ranking strategies, manages corpus analysis, and produces ranked results. It supports multiple algorithms, parallel processing, and custom ranking extensions.

The ranker is designed to be efficient, scalable, and extensible while providing high-quality relevance scoring for code search and context generation.

Classes

RankingAlgorithm

Bases: Enum

Available ranking algorithms.

Each algorithm provides different trade-offs between speed and accuracy.

RankingStatsdataclass

Python
RankingStats(total_files: int = 0, files_ranked: int = 0, files_failed: int = 0, time_elapsed: float = 0.0, algorithm_used: str = '', threshold_applied: float = 0.0, files_above_threshold: int = 0, average_score: float = 0.0, max_score: float = 0.0, min_score: float = 0.0, corpus_stats: Dict[str, Any] = None)

Statistics from ranking operation.

Tracks performance metrics and diagnostic information about the ranking process for monitoring and optimization.

ATTRIBUTEDESCRIPTION
total_files

Total number of files processed

TYPE:int

files_ranked

Number of files successfully ranked

TYPE:int

files_failed

Number of files that failed ranking

TYPE:int

time_elapsed

Total time in seconds

TYPE:float

algorithm_used

Which algorithm was used

TYPE:str

threshold_applied

Relevance threshold used

TYPE:float

files_above_threshold

Number of files above threshold

TYPE:int

average_score

Average relevance score

TYPE:float

max_score

Maximum relevance score

TYPE:float

min_score

Minimum relevance score

TYPE:float

corpus_stats

Dictionary of corpus statistics

TYPE:Dict[str, Any]

Functions
to_dict
Python
to_dict() -> Dict[str, Any]

Convert to dictionary representation.

RETURNSDESCRIPTION
Dict[str, Any]

Dictionary with all statistics

Source code in tenets/core/ranking/ranker.py
Python
def to_dict(self) -> Dict[str, Any]:
    """Convert to dictionary representation.

    Returns:
        Dictionary with all statistics
    """
    return {
        "total_files": self.total_files,
        "files_ranked": self.files_ranked,
        "files_failed": self.files_failed,
        "time_elapsed": self.time_elapsed,
        "algorithm_used": self.algorithm_used,
        "threshold_applied": self.threshold_applied,
        "files_above_threshold": self.files_above_threshold,
        "average_score": self.average_score,
        "max_score": self.max_score,
        "min_score": self.min_score,
        "corpus_stats": self.corpus_stats,
    }

RelevanceRanker

Python
RelevanceRanker(config: TenetsConfig, algorithm: Optional[str] = None, use_stopwords: Optional[bool] = None)

Main relevance ranking system.

Orchestrates the ranking process by analyzing the corpus, selecting appropriate strategies, and producing ranked results. Supports multiple algorithms, parallel processing, and custom ranking extensions.

The ranker follows a multi-stage process: 1. Corpus analysis (TF-IDF, import graph, statistics) 2. Strategy selection based on algorithm 3. Parallel factor calculation 4. Score aggregation and weighting 5. Filtering and sorting

ATTRIBUTEDESCRIPTION
config

TenetsConfig instance

logger

Logger instance

strategies

Available ranking strategies

custom_rankers

Custom ranking functions

TYPE:List[Callable]

executor

Thread pool for parallel processing

stats

Latest ranking statistics

cache

Internal cache for optimizations

Initialize the relevance ranker.

PARAMETERDESCRIPTION
config

Tenets configuration

TYPE:TenetsConfig

algorithm

Override default algorithm

TYPE:Optional[str]DEFAULT:None

use_stopwords

Override stopword filtering setting

TYPE:Optional[bool]DEFAULT:None

Source code in tenets/core/ranking/ranker.py
Python
def __init__(
    self,
    config: TenetsConfig,
    algorithm: Optional[str] = None,
    use_stopwords: Optional[bool] = None,
):
    """Initialize the relevance ranker.

    Args:
        config: Tenets configuration
        algorithm: Override default algorithm
        use_stopwords: Override stopword filtering setting
    """
    self.config = config
    self.logger = get_logger(__name__)

    # Determine algorithm
    algo_str = algorithm or config.ranking.algorithm
    try:
        self.algorithm = RankingAlgorithm(algo_str)
    except ValueError:
        self.logger.warning(f"Unknown algorithm '{algo_str}', using balanced")
        self.algorithm = RankingAlgorithm.BALANCED

    # Stopword configuration
    self.use_stopwords = (
        use_stopwords if use_stopwords is not None else config.ranking.use_stopwords
    )

    # ML configuration
    self.use_ml = (
        config.ranking.use_ml if config and hasattr(config.ranking, "use_ml") else False
    )

    # Initialize strategies lazily to avoid loading unnecessary models
    self._strategies_cache: Dict[RankingAlgorithm, RankingStrategy] = {}
    self.strategies = self._strategies_cache  # Alias for compatibility

    # Pre-populate core strategies for tests that expect them
    # These are lightweight and don't load ML models until actually used
    self._init_core_strategies()

    # Custom rankers list (keep public and test-expected private alias)
    self.custom_rankers: List[Callable] = []
    self._custom_rankers: List[Callable] = self.custom_rankers

    # Thread pool for parallel ranking (lazy initialization to avoid Windows issues)
    from tenets.utils.multiprocessing import get_ranking_workers, log_worker_info

    max_workers = get_ranking_workers(config)
    self.max_workers = max_workers  # Store for logging
    self._executor_instance = None  # Will be created lazily
    # Backwards-compat alias expected by some tests
    self._executor = None

    # Statistics and cache
    self.stats = RankingStats()
    self.cache = {}

    # ML model (loaded lazily)
    self._ml_model = None

    # Optional ML embedding model placeholder for tests that patch it
    # Also expose module-level symbol on instance for convenience
    self.SentenceTransformer = SentenceTransformer

    # Log worker configuration
    log_worker_info(self.logger, "RelevanceRanker", max_workers)
    self.logger.info(
        f"RelevanceRanker initialized: algorithm={self.algorithm.value}, "
        f"use_stopwords={self.use_stopwords}, use_ml={self.use_ml}"
    )
Attributes
executorproperty
Python
executor

Lazy initialization of ThreadPoolExecutor to avoid Windows import issues.

Functions
rank_files
Python
rank_files(files: List[FileAnalysis], prompt_context: PromptContext, algorithm: Optional[str] = None, parallel: bool = True, explain: bool = False) -> List[FileAnalysis]

Rank files by relevance to prompt.

This is the main entry point for ranking files. It analyzes the corpus, applies the selected ranking strategy, and returns files sorted by relevance above the configured threshold.

PARAMETERDESCRIPTION
files

List of files to rank

TYPE:List[FileAnalysis]

prompt_context

Parsed prompt information

TYPE:PromptContext

algorithm

Override algorithm for this ranking

TYPE:Optional[str]DEFAULT:None

parallel

Whether to rank files in parallel

TYPE:boolDEFAULT:True

explain

Whether to generate ranking explanations

TYPE:boolDEFAULT:False

RETURNSDESCRIPTION
List[FileAnalysis]

List of FileAnalysis objects sorted by relevance (highest first)

List[FileAnalysis]

and filtered by threshold

RAISESDESCRIPTION
ValueError

If algorithm is invalid

Source code in tenets/core/ranking/ranker.py
Python
def rank_files(
    self,
    files: List[FileAnalysis],
    prompt_context: PromptContext,
    algorithm: Optional[str] = None,
    parallel: bool = True,
    explain: bool = False,
) -> List[FileAnalysis]:
    """Rank files by relevance to prompt.

    This is the main entry point for ranking files. It analyzes the corpus,
    applies the selected ranking strategy, and returns files sorted by
    relevance above the configured threshold.

    Args:
        files: List of files to rank
        prompt_context: Parsed prompt information
        algorithm: Override algorithm for this ranking
        parallel: Whether to rank files in parallel
        explain: Whether to generate ranking explanations

    Returns:
        List of FileAnalysis objects sorted by relevance (highest first)
        and filtered by threshold

    Raises:
        ValueError: If algorithm is invalid
    """
    if not files:
        return []

    start_time = time.time()

    # Reset statistics
    self.stats = RankingStats(
        total_files=len(files),
        algorithm_used=algorithm or self.algorithm.value,
        threshold_applied=self.config.ranking.threshold,
    )

    # Check if we need to disable parallel on Windows Python 3.13+
    import sys

    if sys.platform == "win32" and sys.version_info >= (3, 13) and parallel:
        self.logger.warning(
            "Disabling parallel ranking on Windows with Python 3.13+ due to compatibility issues"
        )
        parallel = False

    self.logger.info(
        f"Ranking {len(files)} files using {self.stats.algorithm_used} algorithm "
        f"(parallel={parallel}, workers={self.max_workers if parallel else 1})"
    )

    # Select strategy
    if algorithm:
        try:
            strategy = self._get_strategy(algorithm)
        except ValueError:
            raise ValueError(f"Unknown ranking algorithm: {algorithm}")
    else:
        strategy = self._get_strategy(self.algorithm.value)

    if not strategy:
        raise ValueError(f"No strategy for algorithm: {self.algorithm}")

    # Analyze corpus
    corpus_stats = self._analyze_corpus(files, prompt_context)
    self.stats.corpus_stats = corpus_stats

    # Rank files
    ranked_files = self._rank_with_strategy(
        files, prompt_context, corpus_stats, strategy, parallel
    )

    # Apply custom rankers
    for custom_ranker in self.custom_rankers:
        try:
            ranked_files = custom_ranker(ranked_files, prompt_context)
        except Exception as e:
            self.logger.warning(f"Custom ranker failed: {e}")

    # Sort by score
    ranked_files.sort(reverse=True)

    # Filter by threshold and update statistics
    threshold = self.config.ranking.threshold
    filtered_files = []
    scores = []

    for i, rf in enumerate(ranked_files):
        scores.append(rf.score)

        if rf.score >= threshold:
            # Update FileAnalysis with ranking info
            rf.analysis.relevance_score = rf.score
            rf.analysis.relevance_rank = i + 1

            # Generate explanation if requested
            if explain:
                rf.explanation = rf.generate_explanation(strategy.get_weights(), verbose=True)

            filtered_files.append(rf.analysis)

    # Update statistics
    self.stats.files_ranked = len(ranked_files)
    self.stats.files_above_threshold = len(filtered_files)
    self.stats.time_elapsed = time.time() - start_time

    if scores:
        self.stats.average_score = sum(scores) / len(scores)
        self.stats.max_score = max(scores)
        self.stats.min_score = min(scores)

    # If nothing passed threshold, fall back to returning top 1-3 files
    if not filtered_files and ranked_files:
        top_k = min(3, len(ranked_files))
        fallback = [rf.analysis for rf in ranked_files[:top_k]]
        for i, a in enumerate(fallback, 1):
            a.relevance_score = ranked_files[i - 1].score
            a.relevance_rank = i
        filtered_files = fallback

    self.logger.info(
        f"Ranking complete: {len(filtered_files)}/{len(files)} files "
        f"above threshold ({threshold:.2f}) in {self.stats.time_elapsed:.2f}s"
    )

    # Generate explanation report if requested
    if explain and ranked_files:
        explainer = RankingExplainer()
        explanation = explainer.explain_ranking(ranked_files[:20], strategy.get_weights())
        self.logger.info(f"Ranking Explanation:\n{explanation}")

    return filtered_files
register_custom_ranker
Python
register_custom_ranker(ranker_func: Callable[[List[RankedFile], PromptContext], List[RankedFile]])

Register a custom ranking function.

Custom rankers are applied after the main ranking strategy and can adjust scores based on project-specific logic.

PARAMETERDESCRIPTION
ranker_func

Function that takes ranked files and returns modified list

TYPE:Callable[[List[RankedFile], PromptContext], List[RankedFile]]

Example

def boost_tests(ranked_files, prompt_context): ... if 'test' in prompt_context.text: ... for rf in ranked_files: ... if 'test' in rf.path: ... rf.score *= 1.5 ... return ranked_files ranker.register_custom_ranker(boost_tests)

Source code in tenets/core/ranking/ranker.py
Python
def register_custom_ranker(
    self, ranker_func: Callable[[List[RankedFile], PromptContext], List[RankedFile]]
):
    """Register a custom ranking function.

    Custom rankers are applied after the main ranking strategy and can
    adjust scores based on project-specific logic.

    Args:
        ranker_func: Function that takes ranked files and returns modified list

    Example:
        >>> def boost_tests(ranked_files, prompt_context):
        ...     if 'test' in prompt_context.text:
        ...         for rf in ranked_files:
        ...             if 'test' in rf.path:
        ...                 rf.score *= 1.5
        ...     return ranked_files
        >>> ranker.register_custom_ranker(boost_tests)
    """
    self.custom_rankers.append(ranker_func)
    # Keep alias updated
    self._custom_rankers = self.custom_rankers
    self.logger.info(f"Registered custom ranker: {ranker_func.__name__}")
get_ranking_explanation
Python
get_ranking_explanation(ranked_files: List[RankedFile], top_n: int = 10) -> str

Get detailed explanation of ranking results.

PARAMETERDESCRIPTION
ranked_files

List of ranked files

TYPE:List[RankedFile]

top_n

Number of top files to explain

TYPE:intDEFAULT:10

RETURNSDESCRIPTION
str

Formatted explanation string

Source code in tenets/core/ranking/ranker.py
Python
def get_ranking_explanation(self, ranked_files: List[RankedFile], top_n: int = 10) -> str:
    """Get detailed explanation of ranking results.

    Args:
        ranked_files: List of ranked files
        top_n: Number of top files to explain

    Returns:
        Formatted explanation string
    """
    explainer = RankingExplainer()
    strategy = self.strategies.get(self.algorithm)
    weights = strategy.get_weights() if strategy else {}

    return explainer.explain_ranking(ranked_files[:top_n], weights, top_n=top_n)
get_stats
Python
get_stats() -> RankingStats

Get latest ranking statistics.

RETURNSDESCRIPTION
RankingStats

RankingStats object

Source code in tenets/core/ranking/ranker.py
Python
def get_stats(self) -> RankingStats:
    """Get latest ranking statistics.

    Returns:
        RankingStats object
    """
    return self.stats
shutdown
Python
shutdown()

Shutdown the ranker and clean up resources.

Source code in tenets/core/ranking/ranker.py
Python
def shutdown(self):
    """Shutdown the ranker and clean up resources."""
    if self._executor_instance is not None:
        self._executor_instance.shutdown(wait=True)
    self.logger.info("RelevanceRanker shutdown complete")

Functions

create_ranker

Python
create_ranker(config: Optional[TenetsConfig] = None, algorithm: str = 'balanced', use_stopwords: bool = False) -> RelevanceRanker

Create a configured relevance ranker.

PARAMETERDESCRIPTION
config

Configuration (uses default if None)

TYPE:Optional[TenetsConfig]DEFAULT:None

algorithm

Ranking algorithm to use

TYPE:strDEFAULT:'balanced'

use_stopwords

Whether to filter stopwords

TYPE:boolDEFAULT:False

RETURNSDESCRIPTION
RelevanceRanker

Configured RelevanceRanker instance

Source code in tenets/core/ranking/ranker.py
Python
def create_ranker(
    config: Optional[TenetsConfig] = None, algorithm: str = "balanced", use_stopwords: bool = False
) -> RelevanceRanker:
    """Create a configured relevance ranker.

    Args:
        config: Configuration (uses default if None)
        algorithm: Ranking algorithm to use
        use_stopwords: Whether to filter stopwords

    Returns:
        Configured RelevanceRanker instance
    """
    if config is None:
        config = TenetsConfig()

    return RelevanceRanker(config, algorithm=algorithm, use_stopwords=use_stopwords)