`tenets.core.ranking` Package¶

Relevance ranking system for Tenets.

This package provides sophisticated file ranking capabilities using multiple strategies from simple keyword matching to advanced ML-based semantic analysis. The ranking system is designed to efficiently identify the most relevant files for a given prompt or query.

Main components: - RelevanceRanker: Main orchestrator for ranking operations - RankingFactors: Comprehensive factors used for scoring - RankedFile: File with ranking information - Ranking strategies: Fast, Balanced, Thorough, ML - TF-IDF and BM25 calculators for text similarity

Example usage

from tenets.core.ranking import RelevanceRanker, create_ranker from tenets.models.context import PromptContext
Create ranker with config¶
ranker = create_ranker(algorithm="balanced")
Parse prompt¶
prompt_context = PromptContext(text="implement OAuth authentication")
Rank files¶
ranked_files = ranker.rank_files(files, prompt_context)
Get top relevant files¶
for file in ranked_files[:10]: ... print(f"{file.path}: {file.relevance_score:.3f}")

Attributes¶

ML_AVAILABLE`module-attribute`¶

Python

ML_AVAILABLE = True

Classes¶

BM25Calculator¶

Python

BM25Calculator(k1: float = 1.2, b: float = 0.75, epsilon: float = 0.25, use_stopwords: bool = False, stopword_set: str = 'code')

BM25 ranking algorithm with advanced features for code search.

This implementation provides

Configurable term saturation (k1) and length normalization (b)
Efficient tokenization with optional stopword filtering
IDF caching for performance
Support for incremental corpus updates
Query expansion capabilities
Detailed scoring explanations for debugging

ATTRIBUTE	DESCRIPTION
`k1`	Controls term frequency saturation. Higher values mean less saturation (more weight to term frequency). Typical range: 0.5-2.0, default: 1.2 TYPE:`float`
`b`	Controls document length normalization. 0 = no normalization, 1 = full normalization. Typical range: 0.5-0.8, default: 0.75 TYPE:`float`
`epsilon`	Small constant to prevent division by zero TYPE:`float`

Initialize BM25 calculator with configurable parameters.

PARAMETER	DESCRIPTION
`k1`	Term frequency saturation parameter. Lower values (0.5-1.0) work well for short queries, higher values (1.5-2.0) for longer queries. Default: 1.2 (good general purpose value) TYPE:`float`DEFAULT:`1.2`
`b`	Length normalization parameter. Set to 0 to disable length normalization, 1 for full normalization. Default: 0.75 (moderate normalization, good for mixed-length documents) TYPE:`float`DEFAULT:`0.75`
`epsilon`	Small constant for numerical stability TYPE:`float`DEFAULT:`0.25`
`use_stopwords`	Whether to filter common words TYPE:`bool`DEFAULT:`False`
`stopword_set`	Which stopword set to use ('code' for programming, 'english' for natural language) TYPE:`str`DEFAULT:`'code'`

Attributes¶

logger`instance-attribute`¶

Python

logger = get_logger(__name__)

k1`instance-attribute`¶

Python

k1 = k1

b`instance-attribute`¶

Python

b = b

epsilon`instance-attribute`¶

Python

epsilon = epsilon

use_stopwords`instance-attribute`¶

Python

use_stopwords = use_stopwords

stopword_set`instance-attribute`¶

Python

stopword_set = stopword_set

tokenizer`instance-attribute`¶

Python

tokenizer = CodeTokenizer(use_stopwords=use_stopwords)

document_count`instance-attribute`¶

Python

document_count = 0

document_frequency`instance-attribute`¶

Python

document_frequency: Dict[str, int] = defaultdict(int)

document_lengths`instance-attribute`¶

Python

document_lengths: Dict[str, int] = {}

document_tokens`instance-attribute`¶

Python

document_tokens: Dict[str, List[str]] = {}

average_doc_length`instance-attribute`¶

Python

average_doc_length = 0.0

vocabulary`instance-attribute`¶

Python

vocabulary: Set[str] = set()

idf_cache`instance-attribute`¶

Python

idf_cache: Dict[str, float] = {}

stats`instance-attribute`¶

Python

stats = {'queries_processed': 0, 'cache_hits': 0, 'cache_misses': 0, 'documents_added': 0}

Functions¶

tokenize¶

Python

tokenize(text: str) -> List[str]

Tokenize text using code-aware tokenizer.

Handles various code constructs

CamelCase and snake_case splitting
Preservation of important symbols
Number and identifier extraction

PARAMETER	DESCRIPTION
`text`	Input text to tokenize TYPE:`str`

RETURNS	DESCRIPTION
`List[str]`	List of tokens, lowercased and filtered

add_document¶

Python

add_document(doc_id: str, text: str) -> None

Add a document to the BM25 corpus.

Updates all corpus statistics including document frequency, average document length, and vocabulary.

PARAMETER	DESCRIPTION
`doc_id`	Unique identifier for the document TYPE:`str`
`text`	Document content TYPE:`str`

Note

Adding documents invalidates the IDF and score caches. For bulk loading, use build_corpus() instead.

build_corpus¶

Python

build_corpus(documents: List[Tuple[str, str]]) -> None

Build BM25 corpus from multiple documents efficiently.

More efficient than repeated add_document() calls as it calculates statistics once at the end.

PARAMETER	DESCRIPTION
`documents`	List of (doc_id, text) tuples TYPE:`List[Tuple[str, str]]`

Example

documents = [ ... ("file1.py", "import os\nclass FileHandler"), ... ("file2.py", "from pathlib import Path") ... ] bm25.build_corpus(documents)

compute_idf¶

Python

compute_idf(term: str) -> float

Compute IDF (Inverse Document Frequency) for a term.

Uses the standard BM25 IDF formula with smoothing to handle edge cases and prevent negative values.

Formula

IDF(term) = log[(N - df + 0.5) / (df + 0.5) + 1]

PARAMETER	DESCRIPTION
`term`	Term to compute IDF for TYPE:`str`

RETURNS	DESCRIPTION
`float`	IDF value (always positive due to +1 in formula)

score_document¶

Python

score_document(query_tokens: List[str], doc_id: str, explain: bool = False) -> float

Calculate BM25 score for a document given query tokens.

Implements the full BM25 scoring formula with term saturation and length normalization.

PARAMETER	DESCRIPTION
`query_tokens`	Tokenized query terms TYPE:`List[str]`
`doc_id`	Document identifier to score TYPE:`str`
`explain`	If True, return detailed scoring breakdown TYPE:`bool`DEFAULT:`False`

RETURNS	DESCRIPTION
`float`	BM25 score (higher is more relevant)
`float`	If explain=True, returns tuple of (score, explanation_dict)

get_scores¶

Python

get_scores(query: str, doc_ids: Optional[List[str]] = None) -> List[Tuple[str, float]]

Get BM25 scores for all documents or a subset.

PARAMETER	DESCRIPTION
`query`	Search query string TYPE:`str`
`doc_ids`	Optional list of document IDs to score. If None, scores all documents. TYPE:`Optional[List[str]]`DEFAULT:`None`

RETURNS	DESCRIPTION
`List[Tuple[str, float]]`	List of (doc_id, score) tuples sorted by score (descending)

get_top_k¶

Python

get_top_k(query: str, k: int = 10, threshold: float = 0.0) -> List[Tuple[str, float]]

Get top-k documents by BM25 score.

PARAMETER	DESCRIPTION
`query`	Search query TYPE:`str`
`k`	Number of top documents to return TYPE:`int`DEFAULT:`10`
`threshold`	Minimum score threshold (documents below are filtered) TYPE:`float`DEFAULT:`0.0`

RETURNS	DESCRIPTION
`List[Tuple[str, float]]`	List of top-k (doc_id, score) tuples

compute_similarity¶

Python

compute_similarity(query: str, doc_id: str) -> float

Compute normalized similarity score between query and document.

Returns a value between 0 and 1 for consistency with other similarity measures.

PARAMETER	DESCRIPTION
`query`	Query text TYPE:`str`
`doc_id`	Document identifier TYPE:`str`

RETURNS	DESCRIPTION
`float`	Normalized similarity score (0-1)

explain_score¶

Python

explain_score(query: str, doc_id: str) -> Dict

Get detailed explanation of BM25 scoring for debugging.

PARAMETER	DESCRIPTION
`query`	Query text TYPE:`str`
`doc_id`	Document to explain scoring for TYPE:`str`

RETURNS	DESCRIPTION
`Dict`	Dictionary with detailed scoring breakdown

get_stats¶

Python

get_stats() -> Dict

Get calculator statistics for monitoring.

RETURNS	DESCRIPTION
`Dict`	Dictionary with usage statistics

clear_cache¶

Python

clear_cache() -> None

Clear all caches to free memory.

TFIDFCalculator¶

Python

TFIDFCalculator(use_stopwords: bool = False)

TF-IDF calculator for ranking.

Simplified wrapper around NLP TFIDFCalculator to maintain existing ranking API while using centralized logic.

Initialize TF-IDF calculator.

PARAMETER	DESCRIPTION
`use_stopwords`	Whether to filter stopwords (uses 'code' set) TYPE:`bool`DEFAULT:`False`

Attributes¶

logger`instance-attribute`¶

Python

logger = get_logger(__name__)

use_stopwords`instance-attribute`¶

Python

use_stopwords = use_stopwords

stopwords`instance-attribute`¶

Python

stopwords: Set[str] = set(words) if sw else set()

document_vectors`property`¶

Python

document_vectors: Dict[str, Dict[str, float]]

Get document vectors.

document_norms`property`¶

Python

document_norms: Dict[str, float]

Get document vector norms.

vocabulary`property`¶

Python

vocabulary: set

Get vocabulary.

document_count`propertywritable`¶

Python

document_count: int

document_frequency`propertywritable`¶

Python

document_frequency: Dict[str, int]

idf_cache`propertywritable`¶

Python

idf_cache: Dict[str, float]

Functions¶

tokenize¶

Python

tokenize(text: str) -> List[str]

Tokenize text using NLP tokenizer.

PARAMETER	DESCRIPTION
`text`	Input text TYPE:`str`

RETURNS	DESCRIPTION
`List[str]`	List of tokens

add_document¶

Python

add_document(doc_id: str, text: str) -> Dict[str, float]

Add document to corpus.

PARAMETER	DESCRIPTION
`doc_id`	Document identifier TYPE:`str`
`text`	Document content TYPE:`str`

RETURNS	DESCRIPTION
`Dict[str, float]`	TF-IDF vector for document

compute_tf¶

Python

compute_tf(tokens: List[str], use_sublinear: bool = True) -> Dict[str, float]

compute_idf¶

Python

compute_idf(term: str) -> float

compute_similarity¶

Python

compute_similarity(query_text: str, doc_id: str) -> float

Compute similarity between query and document.

PARAMETER	DESCRIPTION
`query_text`	Query text TYPE:`str`
`doc_id`	Document identifier TYPE:`str`

RETURNS	DESCRIPTION
`float`	Cosine similarity score (0-1)

get_top_terms¶

Python

get_top_terms(doc_id: str, n: int = 10) -> List[Tuple[str, float]]

Return the top-n TF-IDF terms for a given document.

PARAMETER	DESCRIPTION
`doc_id`	Document identifier TYPE:`str`
`n`	Maximum number of terms to return TYPE:`int`DEFAULT:`10`

RETURNS	DESCRIPTION
`List[Tuple[str, float]]`	List of (term, score) sorted by score descending

build_corpus¶

Python

build_corpus(documents: List[Tuple[str, str]]) -> None

Build corpus from documents.

PARAMETER	DESCRIPTION
`documents`	List of (doc_id, text) tuples TYPE:`List[Tuple[str, str]]`

FactorWeight¶

Bases: Enum

Standard weight presets for ranking factors.

These presets provide balanced weights for different use cases. Can be overridden with custom weights in configuration.

Attributes¶

KEYWORD_MATCH`class-attributeinstance-attribute`¶

Python

KEYWORD_MATCH = 0.25

TFIDF_SIMILARITY`class-attributeinstance-attribute`¶

Python

TFIDF_SIMILARITY = 0.2

BM25_SCORE`class-attributeinstance-attribute`¶

Python

BM25_SCORE = 0.15

PATH_RELEVANCE`class-attributeinstance-attribute`¶

Python

PATH_RELEVANCE = 0.15

IMPORT_CENTRALITY`class-attributeinstance-attribute`¶

Python

IMPORT_CENTRALITY = 0.1

GIT_RECENCY`class-attributeinstance-attribute`¶

Python

GIT_RECENCY = 0.05

GIT_FREQUENCY`class-attributeinstance-attribute`¶

Python

GIT_FREQUENCY = 0.05

COMPLEXITY_RELEVANCE`class-attributeinstance-attribute`¶

Python

COMPLEXITY_RELEVANCE = 0.05

SEMANTIC_SIMILARITY`class-attributeinstance-attribute`¶

Python

SEMANTIC_SIMILARITY = 0.25

TYPE_RELEVANCE`class-attributeinstance-attribute`¶

Python

TYPE_RELEVANCE = 0.1

CODE_PATTERNS`class-attributeinstance-attribute`¶

Python

CODE_PATTERNS = 0.1

AST_RELEVANCE`class-attributeinstance-attribute`¶

Python

AST_RELEVANCE = 0.1

DEPENDENCY_DEPTH`class-attributeinstance-attribute`¶

Python

DEPENDENCY_DEPTH = 0.05

RankedFile`dataclass`¶

Python

RankedFile(analysis: FileAnalysis, score: float, factors: RankingFactors, explanation: str = '', confidence: float = 1.0, rank: Optional[int] = None, metadata: Dict[str, Any] = dict())

A file with its relevance ranking.

Combines a FileAnalysis with ranking scores and metadata. Provides utilities for comparison, explanation generation, and result formatting.

ATTRIBUTE	DESCRIPTION
`analysis`	The FileAnalysis object TYPE:`FileAnalysis`
`score`	Overall relevance score (0-1) TYPE:`float`
`factors`	Detailed ranking factors TYPE:`RankingFactors`
`explanation`	Human-readable ranking explanation TYPE:`str`
`confidence`	Confidence in the ranking (0-1) TYPE:`float`
`rank`	Position in ranked list (1-based) TYPE:`Optional[int]`
`metadata`	Additional ranking metadata TYPE:`Dict[str, Any]`

Attributes¶

analysis`instance-attribute`¶

Python

analysis: FileAnalysis

score`instance-attribute`¶

Python

score: float

factors`instance-attribute`¶

Python

factors: RankingFactors

explanation`class-attributeinstance-attribute`¶

Python

explanation: str = ''

confidence`class-attributeinstance-attribute`¶

Python

confidence: float = 1.0

rank`class-attributeinstance-attribute`¶

Python

rank: Optional[int] = None

metadata`class-attributeinstance-attribute`¶

Python

metadata: Dict[str, Any] = field(default_factory=dict)

path`property`¶

Python

path: str

Get file path.

file_name`property`¶

Python

file_name: str

Get file name.

language`property`¶

Python

language: str

Get file language.

Functions¶

generate_explanation¶

Python

generate_explanation(weights: Dict[str, float], verbose: bool = False) -> str

Generate human-readable explanation of ranking.

PARAMETER	DESCRIPTION
`weights`	Factor weights used for ranking TYPE:`Dict[str, float]`
`verbose`	Include detailed factor breakdown TYPE:`bool`DEFAULT:`False`

RETURNS	DESCRIPTION
`str`	Explanation string

to_dict¶

Python

to_dict() -> Dict[str, Any]

Convert to dictionary representation.

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary with all ranking information

RankingExplainer¶

Python

RankingExplainer()

Utility class for generating ranking explanations.

Provides detailed explanations of why files ranked the way they did, useful for debugging and understanding ranking behavior.

Initialize the explainer.

Attributes¶

logger`instance-attribute`¶

Python

logger = get_logger(__name__)

Functions¶

explain_ranking¶

Python

explain_ranking(ranked_files: List[RankedFile], weights: Dict[str, float], top_n: int = 10, include_factors: bool = True) -> str

Generate comprehensive ranking explanation.

PARAMETER	DESCRIPTION
`ranked_files`	List of ranked files TYPE:`List[RankedFile]`
`weights`	Factor weights used TYPE:`Dict[str, float]`
`top_n`	Number of top files to explain TYPE:`int`DEFAULT:`10`
`include_factors`	Include factor breakdown TYPE:`bool`DEFAULT:`True`

RETURNS	DESCRIPTION
`str`	Formatted explanation string

compare_rankings¶

Python

compare_rankings(rankings1: List[RankedFile], rankings2: List[RankedFile], labels: Tuple[str, str] = ('Ranking 1', 'Ranking 2')) -> str

Compare two different rankings.

Useful for understanding how different algorithms or weights affect ranking results.

PARAMETER	DESCRIPTION
`rankings1`	First ranking TYPE:`List[RankedFile]`
`rankings2`	Second ranking TYPE:`List[RankedFile]`
`labels`	Labels for the two rankings TYPE:`Tuple[str, str]`DEFAULT:`('Ranking 1', 'Ranking 2')`

RETURNS	DESCRIPTION
`str`	Comparison report

RankingFactors`dataclass`¶

Python

RankingFactors(keyword_match: float = 0.0, tfidf_similarity: float = 0.0, bm25_score: float = 0.0, path_relevance: float = 0.0, import_centrality: float = 0.0, dependency_depth: float = 0.0, git_recency: float = 0.0, git_frequency: float = 0.0, git_author_relevance: float = 0.0, complexity_relevance: float = 0.0, maintainability_score: float = 0.0, semantic_similarity: float = 0.0, type_relevance: float = 0.0, code_patterns: float = 0.0, ast_relevance: float = 0.0, test_coverage: float = 0.0, documentation_score: float = 0.0, custom_scores: Dict[str, float] = dict(), metadata: Dict[str, Any] = dict())

Comprehensive ranking factors for a file.

Each factor represents a different dimension of relevance. The final relevance score is computed as a weighted sum of these factors.

Factors are grouped into categories: - Text-based: keyword_match, tfidf_similarity, bm25_score - Structure-based: path_relevance, import_centrality, dependency_depth - Git-based: git_recency, git_frequency, git_author_relevance - Complexity-based: complexity_relevance, maintainability_score - Semantic: semantic_similarity (requires ML) - Pattern-based: code_patterns, ast_relevance - Custom: custom_scores for project-specific factors

ATTRIBUTE	DESCRIPTION
`keyword_match`	Direct keyword matching score (0-1) TYPE:`float`
`tfidf_similarity`	TF-IDF cosine similarity score (0-1) TYPE:`float`
`bm25_score`	BM25 relevance score (0-1) TYPE:`float`
`path_relevance`	File path relevance to query (0-1) TYPE:`float`
`import_centrality`	How central file is in import graph (0-1) TYPE:`float`
`git_recency`	How recently file was modified (0-1) TYPE:`float`
`git_frequency`	How frequently file changes (0-1) TYPE:`float`
`git_author_relevance`	Relevance based on commit authors (0-1) TYPE:`float`
`complexity_relevance`	Relevance based on code complexity (0-1) TYPE:`float`
`maintainability_score`	Code maintainability score (0-1) TYPE:`float`
`semantic_similarity`	ML-based semantic similarity (0-1) TYPE:`float`
`type_relevance`	Relevance based on file type (0-1) TYPE:`float`
`code_patterns`	Pattern matching score (0-1) TYPE:`float`
`ast_relevance`	AST structure relevance (0-1) TYPE:`float`
`dependency_depth`	Dependency tree depth score (0-1) TYPE:`float`
`test_coverage`	Test coverage relevance (0-1) TYPE:`float`
`documentation_score`	Documentation quality score (0-1) TYPE:`float`
`custom_scores`	Dictionary of custom factor scores TYPE:`Dict[str, float]`
`metadata`	Additional metadata about factor calculation TYPE:`Dict[str, Any]`

Attributes¶

keyword_match`class-attributeinstance-attribute`¶

Python

keyword_match: float = 0.0

tfidf_similarity`class-attributeinstance-attribute`¶

Python

tfidf_similarity: float = 0.0

bm25_score`class-attributeinstance-attribute`¶

Python

bm25_score: float = 0.0

path_relevance`class-attributeinstance-attribute`¶

Python

path_relevance: float = 0.0

import_centrality`class-attributeinstance-attribute`¶

Python

import_centrality: float = 0.0

dependency_depth`class-attributeinstance-attribute`¶

Python

dependency_depth: float = 0.0

git_recency`class-attributeinstance-attribute`¶

Python

git_recency: float = 0.0

git_frequency`class-attributeinstance-attribute`¶

Python

git_frequency: float = 0.0

git_author_relevance`class-attributeinstance-attribute`¶

Python

git_author_relevance: float = 0.0

complexity_relevance`class-attributeinstance-attribute`¶

Python

complexity_relevance: float = 0.0

maintainability_score`class-attributeinstance-attribute`¶

Python

maintainability_score: float = 0.0

semantic_similarity`class-attributeinstance-attribute`¶

Python

semantic_similarity: float = 0.0

type_relevance`class-attributeinstance-attribute`¶

Python

type_relevance: float = 0.0

code_patterns`class-attributeinstance-attribute`¶

Python

code_patterns: float = 0.0

ast_relevance`class-attributeinstance-attribute`¶

Python

ast_relevance: float = 0.0

documentation_score`class-attributeinstance-attribute`¶

Python

documentation_score: float = 0.0

custom_scores`class-attributeinstance-attribute`¶

Python

custom_scores: Dict[str, float] = field(default_factory=dict)

metadata`class-attributeinstance-attribute`¶

Python

metadata: Dict[str, Any] = field(default_factory=dict)

Functions¶

get_weighted_score¶

Python

get_weighted_score(weights: Dict[str, float], normalize: bool = True) -> float

Calculate weighted relevance score.

PARAMETER	DESCRIPTION
`weights`	Dictionary mapping factor names to weights TYPE:`Dict[str, float]`
`normalize`	Whether to normalize final score to [0, 1] TYPE:`bool`DEFAULT:`True`

RETURNS	DESCRIPTION
`float`	Weighted relevance score

get_top_factors¶

Python

get_top_factors(weights: Dict[str, float], n: int = 5) -> List[Tuple[str, float, float]]

Get the top contributing factors.

PARAMETER	DESCRIPTION
`weights`	Factor weights TYPE:`Dict[str, float]`
`n`	Number of top factors to return TYPE:`int`DEFAULT:`5`

RETURNS	DESCRIPTION
`List[Tuple[str, float, float]]`	List of (factor_name, value, contribution) tuples

to_dict¶

Python

to_dict() -> Dict[str, Any]

Convert factors to dictionary representation.

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary with all factor values

RankingAlgorithm¶

Bases: Enum

Available ranking algorithms.

Each algorithm provides different trade-offs between speed and accuracy.

Attributes¶

FAST`class-attributeinstance-attribute`¶

Python

FAST = 'fast'

BALANCED`class-attributeinstance-attribute`¶

Python

BALANCED = 'balanced'

THOROUGH`class-attributeinstance-attribute`¶

Python

THOROUGH = 'thorough'

ML`class-attributeinstance-attribute`¶

Python

ML = 'ml'

CUSTOM`class-attributeinstance-attribute`¶

Python

CUSTOM = 'custom'

RankingStats`dataclass`¶

Python

RankingStats(total_files: int = 0, files_ranked: int = 0, files_failed: int = 0, time_elapsed: float = 0.0, algorithm_used: str = '', threshold_applied: float = 0.0, files_above_threshold: int = 0, average_score: float = 0.0, max_score: float = 0.0, min_score: float = 0.0, corpus_stats: Dict[str, Any] = None)

Statistics from ranking operation.

Tracks performance metrics and diagnostic information about the ranking process for monitoring and optimization.

ATTRIBUTE	DESCRIPTION
`total_files`	Total number of files processed TYPE:`int`
`files_ranked`	Number of files successfully ranked TYPE:`int`
`files_failed`	Number of files that failed ranking TYPE:`int`
`time_elapsed`	Total time in seconds TYPE:`float`
`algorithm_used`	Which algorithm was used TYPE:`str`
`threshold_applied`	Relevance threshold used TYPE:`float`
`files_above_threshold`	Number of files above threshold TYPE:`int`
`average_score`	Average relevance score TYPE:`float`
`max_score`	Maximum relevance score TYPE:`float`
`min_score`	Minimum relevance score TYPE:`float`
`corpus_stats`	Dictionary of corpus statistics TYPE:`Dict[str, Any]`

Attributes¶

total_files`class-attributeinstance-attribute`¶

Python

total_files: int = 0

files_ranked`class-attributeinstance-attribute`¶

Python

files_ranked: int = 0

files_failed`class-attributeinstance-attribute`¶

Python

files_failed: int = 0

time_elapsed`class-attributeinstance-attribute`¶

Python

time_elapsed: float = 0.0

algorithm_used`class-attributeinstance-attribute`¶

Python

algorithm_used: str = ''

threshold_applied`class-attributeinstance-attribute`¶

Python

threshold_applied: float = 0.0

files_above_threshold`class-attributeinstance-attribute`¶

Python

files_above_threshold: int = 0

average_score`class-attributeinstance-attribute`¶

Python

average_score: float = 0.0

max_score`class-attributeinstance-attribute`¶

Python

max_score: float = 0.0

min_score`class-attributeinstance-attribute`¶

Python

min_score: float = 0.0

corpus_stats`class-attributeinstance-attribute`¶

Python

corpus_stats: Dict[str, Any] = None

Functions¶

to_dict¶

Python

to_dict() -> Dict[str, Any]

Convert to dictionary representation.

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary with all statistics

RelevanceRanker¶

Python

RelevanceRanker(config: TenetsConfig, algorithm: Optional[str] = None, use_stopwords: Optional[bool] = None)

Main relevance ranking system.

Orchestrates the ranking process by analyzing the corpus, selecting appropriate strategies, and producing ranked results. Supports multiple algorithms, parallel processing, and custom ranking extensions.

The ranker follows a multi-stage process: 1. Corpus analysis (TF-IDF, import graph, statistics) 2. Strategy selection based on algorithm 3. Parallel factor calculation 4. Score aggregation and weighting 5. Filtering and sorting

ATTRIBUTE	DESCRIPTION
`config`	TenetsConfig instance
`logger`	Logger instance
`strategies`	Available ranking strategies
`custom_rankers`	Custom ranking functions TYPE:`List[Callable]`
`executor`	Thread pool for parallel processing
`stats`	Latest ranking statistics
`cache`	Internal cache for optimizations

Initialize the relevance ranker.

PARAMETER	DESCRIPTION
`config`	Tenets configuration TYPE:`TenetsConfig`
`algorithm`	Override default algorithm TYPE:`Optional[str]`DEFAULT:`None`
`use_stopwords`	Override stopword filtering setting TYPE:`Optional[bool]`DEFAULT:`None`

Attributes¶

config`instance-attribute`¶

Python

config = config

logger`instance-attribute`¶

Python

logger = get_logger(__name__)

algorithm`instance-attribute`¶

Python

algorithm = RankingAlgorithm(algo_str)

use_stopwords`instance-attribute`¶

Python

use_stopwords = use_stopwords if use_stopwords is not None else use_stopwords

use_ml`instance-attribute`¶

Python

use_ml = use_ml if config and hasattr(ranking, 'use_ml') else False

strategies`instance-attribute`¶

Python

strategies = _strategies_cache

custom_rankers`instance-attribute`¶

Python

custom_rankers: List[Callable] = []

max_workers`instance-attribute`¶

Python

max_workers = max_workers

stats`instance-attribute`¶

Python

stats = RankingStats()

cache`instance-attribute`¶

Python

cache = {}

SentenceTransformer`instance-attribute`¶

Python

SentenceTransformer = SentenceTransformer

executor`property`¶

Python

executor

Lazy initialization of ThreadPoolExecutor to avoid Windows import issues.

Functions¶

rank_files¶

Python

rank_files(files: List[FileAnalysis], prompt_context: PromptContext, algorithm: Optional[str] = None, parallel: bool = True, explain: bool = False) -> List[FileAnalysis]

Rank files by relevance to prompt.

This is the main entry point for ranking files. It analyzes the corpus, applies the selected ranking strategy, and returns files sorted by relevance above the configured threshold.

PARAMETER	DESCRIPTION
`files`	List of files to rank TYPE:`List[FileAnalysis]`
`prompt_context`	Parsed prompt information TYPE:`PromptContext`
`algorithm`	Override algorithm for this ranking TYPE:`Optional[str]`DEFAULT:`None`
`parallel`	Whether to rank files in parallel TYPE:`bool`DEFAULT:`True`
`explain`	Whether to generate ranking explanations TYPE:`bool`DEFAULT:`False`

RETURNS	DESCRIPTION
`List[FileAnalysis]`	List of FileAnalysis objects sorted by relevance (highest first)
`List[FileAnalysis]`	and filtered by threshold

RAISES	DESCRIPTION
`ValueError`	If algorithm is invalid

register_custom_ranker¶

Python

register_custom_ranker(ranker_func: Callable[[List[RankedFile], PromptContext], List[RankedFile]])

Register a custom ranking function.

Custom rankers are applied after the main ranking strategy and can adjust scores based on project-specific logic.

PARAMETER	DESCRIPTION
`ranker_func`	Function that takes ranked files and returns modified list TYPE:`Callable[[List[RankedFile], PromptContext], List[RankedFile]]`

Example

def boost_tests(ranked_files, prompt_context): ... if 'test' in prompt_context.text: ... for rf in ranked_files: ... if 'test' in rf.path: ... rf.score *= 1.5 ... return ranked_files ranker.register_custom_ranker(boost_tests)

get_ranking_explanation¶

Python

get_ranking_explanation(ranked_files: List[RankedFile], top_n: int = 10) -> str

Get detailed explanation of ranking results.

PARAMETER	DESCRIPTION
`ranked_files`	List of ranked files TYPE:`List[RankedFile]`
`top_n`	Number of top files to explain TYPE:`int`DEFAULT:`10`

RETURNS	DESCRIPTION
`str`	Formatted explanation string

get_stats¶

Python

get_stats() -> RankingStats

Get latest ranking statistics.

RETURNS	DESCRIPTION
`RankingStats`	RankingStats object

shutdown¶

Python

shutdown()

Shutdown the ranker and clean up resources.

BalancedRankingStrategy¶

Python

BalancedRankingStrategy()

Bases: RankingStrategy

Balanced multi-factor ranking strategy.

Initialize balanced ranking strategy.

Attributes¶

name`class-attributeinstance-attribute`¶

Python

name = 'balanced'

description`class-attributeinstance-attribute`¶

Python

description = 'Multi-factor ranking with TF-IDF and structure analysis'

logger`instance-attribute`¶

Python

logger = get_logger(__name__)

Functions¶

rank_file¶

Python

rank_file(file: FileAnalysis, prompt_context: PromptContext, corpus_stats: Dict[str, Any]) -> RankingFactors

Balanced ranking using multiple factors.

get_weights¶

Python

get_weights() -> Dict[str, float]

Get weights for balanced ranking.

FastRankingStrategy¶

Python

FastRankingStrategy()

Bases: RankingStrategy

Fast keyword-based ranking strategy.

Initialize fast ranking strategy.

Attributes¶

name`class-attributeinstance-attribute`¶

Python

name = 'fast'

description`class-attributeinstance-attribute`¶

Python

description = 'Quick keyword and path-based ranking'

logger`instance-attribute`¶

Python

logger = get_logger(__name__)

Functions¶

rank_file¶

Python

rank_file(file: FileAnalysis, prompt_context: PromptContext, corpus_stats: Dict[str, Any]) -> RankingFactors

Fast ranking based on keywords and paths.

get_weights¶

Python

get_weights() -> Dict[str, float]

Get weights for fast ranking.

MLRankingStrategy¶

Python

MLRankingStrategy()

Bases: RankingStrategy

Machine Learning-based ranking strategy.

Initialize ML ranking strategy.

Attributes¶

name`class-attributeinstance-attribute`¶

Python

name = 'ml'

description`class-attributeinstance-attribute`¶

Python

description = 'Semantic similarity using ML models'

logger`instance-attribute`¶

Python

logger = get_logger(__name__)

Functions¶

rank_file¶

Python

rank_file(file: FileAnalysis, prompt_context: PromptContext, corpus_stats: Dict[str, Any]) -> RankingFactors

ML-based ranking with semantic similarity.

get_weights¶

Python

get_weights() -> Dict[str, float]

Get weights for ML ranking.

RankingStrategy¶

Bases: ABC

Abstract base class for ranking strategies.

Attributes¶

name`abstractmethodproperty`¶

Python

name: str

Get strategy name.

description`abstractmethodproperty`¶

Python

description: str

Get strategy description.

Functions¶

rank_file`abstractmethod`¶

Python

rank_file(file: FileAnalysis, prompt_context: PromptContext, corpus_stats: Dict[str, Any]) -> RankingFactors

Calculate ranking factors for a file.

get_weights`abstractmethod`¶

Python

get_weights() -> Dict[str, float]

Get factor weights for this strategy.

ThoroughRankingStrategy¶

Python

ThoroughRankingStrategy()

Bases: RankingStrategy

Thorough deep analysis ranking strategy using centralized NLP.

Initialize thorough ranking strategy with NLP components.

Attributes¶

name`class-attributeinstance-attribute`¶

Python

name = 'thorough'

description`class-attributeinstance-attribute`¶

Python

description = 'Deep analysis with code patterns and structure examination'

logger`instance-attribute`¶

Python

logger = get_logger(__name__)

programming_patterns`instance-attribute`¶

Python

programming_patterns = get_programming_patterns()

Functions¶

rank_file¶

Python

rank_file(file: FileAnalysis, prompt_context: PromptContext, corpus_stats: Dict[str, Any]) -> RankingFactors

Thorough ranking with deep analysis using centralized NLP.

get_weights¶

Python

get_weights() -> Dict[str, float]

Get weights for thorough ranking.

Functions¶

create_ranker¶

Python

create_ranker(config: Optional[TenetsConfig] = None, algorithm: str = 'balanced', use_stopwords: bool = False) -> RelevanceRanker

Create a configured relevance ranker.

PARAMETER	DESCRIPTION
`config`	Configuration (uses default if None) TYPE:`Optional[TenetsConfig]`DEFAULT:`None`
`algorithm`	Ranking algorithm to use TYPE:`str`DEFAULT:`'balanced'`
`use_stopwords`	Whether to filter stopwords TYPE:`bool`DEFAULT:`False`

RETURNS	DESCRIPTION
`RelevanceRanker`	Configured RelevanceRanker instance

check_ml_dependencies¶

Python

check_ml_dependencies()

Check ML dependencies (stub).

get_available_models¶

Python

get_available_models()

Get available models (stub).

get_default_ranker¶

Python

get_default_ranker(config: Optional[TenetsConfig] = None) -> RelevanceRanker

Get a default configured ranker.

Convenience function to quickly get a working ranker with sensible defaults.

PARAMETER	DESCRIPTION
`config`	Optional configuration override TYPE:`Optional[TenetsConfig]`DEFAULT:`None`

RETURNS	DESCRIPTION
`RelevanceRanker`	Configured RelevanceRanker instance

rank_files_simple¶

Python

rank_files_simple(files: List, prompt: str, algorithm: str = 'balanced', threshold: float = 0.1) -> List

Simple interface for ranking files.

Provides a simplified API for quick ranking without needing to manage ranker instances or configurations.

PARAMETER	DESCRIPTION
`files`	List of FileAnalysis objects TYPE:`List`
`prompt`	Search prompt or query TYPE:`str`
`algorithm`	Ranking algorithm to use TYPE:`str`DEFAULT:`'balanced'`
`threshold`	Minimum relevance score TYPE:`float`DEFAULT:`0.1`

RETURNS	DESCRIPTION
`List`	List of files sorted by relevance above threshold

Example

from tenets.core.ranking import rank_files_simple relevant_files = rank_files_simple( ... files, ... "authentication logic", ... algorithm="thorough" ... )

explain_ranking¶

Python

explain_ranking(files: List, prompt: str, algorithm: str = 'balanced', top_n: int = 10) -> str

Get explanation of why files ranked the way they did.

Useful for debugging and understanding ranking behavior.

PARAMETER	DESCRIPTION
`files`	List of FileAnalysis objects TYPE:`List`
`prompt`	Search prompt TYPE:`str`
`algorithm`	Algorithm used TYPE:`str`DEFAULT:`'balanced'`
`top_n`	Number of top files to explain TYPE:`int`DEFAULT:`10`

RETURNS	DESCRIPTION
`str`	Formatted explanation string

Example

from tenets.core.ranking import explain_ranking explanation = explain_ranking(files, "database models") print(explanation)

get_default_tfidf¶

Python

get_default_tfidf(use_stopwords: bool = False) -> TFIDFCalculator

Get default TF-IDF calculator instance.

PARAMETER	DESCRIPTION
`use_stopwords`	Whether to filter stopwords TYPE:`bool`DEFAULT:`False`

RETURNS	DESCRIPTION
`TFIDFCalculator`	TFIDFCalculator instance

get_default_bm25¶

Python

get_default_bm25(use_stopwords: bool = False) -> BM25Calculator

Get default BM25 calculator instance.

PARAMETER	DESCRIPTION
`use_stopwords`	Whether to filter stopwords TYPE:`bool`DEFAULT:`False`

RETURNS	DESCRIPTION
`BM25Calculator`	BM25Calculator instance

Modules¶

factors - Factors module
ranker - Ranker module
strategies - Strategies module

tenets.core.ranking Package¶

Create ranker with config¶

Parse prompt¶

Rank files¶

Get top relevant files¶

Attributes¶

ML_AVAILABLEmodule-attribute¶

Classes¶

BM25Calculator¶

Attributes¶

loggerinstance-attribute¶

k1instance-attribute¶

binstance-attribute¶

epsiloninstance-attribute¶

use_stopwordsinstance-attribute¶

stopword_setinstance-attribute¶

tokenizerinstance-attribute¶

document_countinstance-attribute¶

document_frequencyinstance-attribute¶

document_lengthsinstance-attribute¶

document_tokensinstance-attribute¶

average_doc_lengthinstance-attribute¶

vocabularyinstance-attribute¶

idf_cacheinstance-attribute¶

statsinstance-attribute¶

Functions¶

tokenize¶

add_document¶

build_corpus¶

compute_idf¶

score_document¶

get_scores¶

get_top_k¶

compute_similarity¶

explain_score¶

get_stats¶

clear_cache¶

TFIDFCalculator¶

Attributes¶

loggerinstance-attribute¶

use_stopwordsinstance-attribute¶

stopwordsinstance-attribute¶

document_vectorsproperty¶

document_normsproperty¶

vocabularyproperty¶

document_countpropertywritable¶

document_frequencypropertywritable¶

idf_cachepropertywritable¶

Functions¶

tokenize¶

add_document¶

compute_tf¶

compute_idf¶

compute_similarity¶

get_top_terms¶

build_corpus¶

FactorWeight¶

Attributes¶

KEYWORD_MATCHclass-attributeinstance-attribute¶

TFIDF_SIMILARITYclass-attributeinstance-attribute¶

BM25_SCOREclass-attributeinstance-attribute¶

PATH_RELEVANCEclass-attributeinstance-attribute¶

IMPORT_CENTRALITYclass-attributeinstance-attribute¶

GIT_RECENCYclass-attributeinstance-attribute¶

GIT_FREQUENCYclass-attributeinstance-attribute¶

COMPLEXITY_RELEVANCEclass-attributeinstance-attribute¶

SEMANTIC_SIMILARITYclass-attributeinstance-attribute¶

TYPE_RELEVANCEclass-attributeinstance-attribute¶

CODE_PATTERNSclass-attributeinstance-attribute¶

AST_RELEVANCEclass-attributeinstance-attribute¶

DEPENDENCY_DEPTHclass-attributeinstance-attribute¶

RankedFiledataclass¶

Attributes¶

analysisinstance-attribute¶

scoreinstance-attribute¶

factorsinstance-attribute¶

explanationclass-attributeinstance-attribute¶

confidenceclass-attributeinstance-attribute¶

rankclass-attributeinstance-attribute¶

metadataclass-attributeinstance-attribute¶

`tenets.core.ranking` Package¶

ML_AVAILABLE`module-attribute`¶

logger`instance-attribute`¶

k1`instance-attribute`¶

b`instance-attribute`¶

epsilon`instance-attribute`¶

use_stopwords`instance-attribute`¶

stopword_set`instance-attribute`¶

tokenizer`instance-attribute`¶

document_count`instance-attribute`¶

document_frequency`instance-attribute`¶

document_lengths`instance-attribute`¶

document_tokens`instance-attribute`¶

average_doc_length`instance-attribute`¶

vocabulary`instance-attribute`¶

idf_cache`instance-attribute`¶

stats`instance-attribute`¶

logger`instance-attribute`¶

use_stopwords`instance-attribute`¶

stopwords`instance-attribute`¶

document_vectors`property`¶

document_norms`property`¶

vocabulary`property`¶

document_count`propertywritable`¶

document_frequency`propertywritable`¶

idf_cache`propertywritable`¶

KEYWORD_MATCH`class-attributeinstance-attribute`¶

TFIDF_SIMILARITY`class-attributeinstance-attribute`¶

BM25_SCORE`class-attributeinstance-attribute`¶

PATH_RELEVANCE`class-attributeinstance-attribute`¶

IMPORT_CENTRALITY`class-attributeinstance-attribute`¶

GIT_RECENCY`class-attributeinstance-attribute`¶

GIT_FREQUENCY`class-attributeinstance-attribute`¶

COMPLEXITY_RELEVANCE`class-attributeinstance-attribute`¶

SEMANTIC_SIMILARITY`class-attributeinstance-attribute`¶

TYPE_RELEVANCE`class-attributeinstance-attribute`¶

CODE_PATTERNS`class-attributeinstance-attribute`¶

AST_RELEVANCE`class-attributeinstance-attribute`¶

DEPENDENCY_DEPTH`class-attributeinstance-attribute`¶

RankedFile`dataclass`¶

analysis`instance-attribute`¶

score`instance-attribute`¶

factors`instance-attribute`¶

explanation`class-attributeinstance-attribute`¶

confidence`class-attributeinstance-attribute`¶

rank`class-attributeinstance-attribute`¶

metadata`class-attributeinstance-attribute`¶

path`property`¶

file_name`property`¶

language`property`¶

logger`instance-attribute`¶

RankingFactors`dataclass`¶

keyword_match`class-attributeinstance-attribute`¶

tfidf_similarity`class-attributeinstance-attribute`¶

bm25_score`class-attributeinstance-attribute`¶

path_relevance`class-attributeinstance-attribute`¶

import_centrality`class-attributeinstance-attribute`¶

dependency_depth`class-attributeinstance-attribute`¶

git_recency`class-attributeinstance-attribute`¶

git_frequency`class-attributeinstance-attribute`¶

git_author_relevance`class-attributeinstance-attribute`¶

complexity_relevance`class-attributeinstance-attribute`¶

maintainability_score`class-attributeinstance-attribute`¶

semantic_similarity`class-attributeinstance-attribute`¶

type_relevance`class-attributeinstance-attribute`¶

code_patterns`class-attributeinstance-attribute`¶

ast_relevance`class-attributeinstance-attribute`¶

documentation_score`class-attributeinstance-attribute`¶

custom_scores`class-attributeinstance-attribute`¶

metadata`class-attributeinstance-attribute`¶

FAST`class-attributeinstance-attribute`¶

BALANCED`class-attributeinstance-attribute`¶

THOROUGH`class-attributeinstance-attribute`¶

ML`class-attributeinstance-attribute`¶

CUSTOM`class-attributeinstance-attribute`¶

RankingStats`dataclass`¶

total_files`class-attributeinstance-attribute`¶

files_ranked`class-attributeinstance-attribute`¶

files_failed`class-attributeinstance-attribute`¶

time_elapsed`class-attributeinstance-attribute`¶