tenets.core.ranking
Package¶
Relevance ranking system for Tenets.
This package provides sophisticated file ranking capabilities using multiple strategies from simple keyword matching to advanced ML-based semantic analysis. The ranking system is designed to efficiently identify the most relevant files for a given prompt or query.
Main components: - RelevanceRanker: Main orchestrator for ranking operations - RankingFactors: Comprehensive factors used for scoring - RankedFile: File with ranking information - Ranking strategies: Fast, Balanced, Thorough, ML - TF-IDF and BM25 calculators for text similarity
Example usage
from tenets.core.ranking import RelevanceRanker, create_ranker from tenets.models.context import PromptContext
Create ranker with config¶
ranker = create_ranker(algorithm="balanced")
Parse prompt¶
prompt_context = PromptContext(text="implement OAuth authentication")
Rank files¶
ranked_files = ranker.rank_files(files, prompt_context)
Get top relevant files¶
for file in ranked_files[:10]: ... print(f"{file.path}: {file.relevance_score:.3f}")
Attributes¶
ML_AVAILABLEmodule-attribute
¶
Classes¶
BM25Calculator¶
BM25Calculator(k1: float = 1.2, b: float = 0.75, epsilon: float = 0.25, use_stopwords: bool = False, stopword_set: str = 'code')
BM25 ranking algorithm with advanced features for code search.
This implementation provides
- Configurable term saturation (k1) and length normalization (b)
- Efficient tokenization with optional stopword filtering
- IDF caching for performance
- Support for incremental corpus updates
- Query expansion capabilities
- Detailed scoring explanations for debugging
ATTRIBUTE | DESCRIPTION |
---|---|
k1 | Controls term frequency saturation. Higher values mean less saturation (more weight to term frequency). Typical range: 0.5-2.0, default: 1.2 TYPE: |
b | Controls document length normalization. 0 = no normalization, 1 = full normalization. Typical range: 0.5-0.8, default: 0.75 TYPE: |
epsilon | Small constant to prevent division by zero TYPE: |
Initialize BM25 calculator with configurable parameters.
PARAMETER | DESCRIPTION |
---|---|
k1 | Term frequency saturation parameter. Lower values (0.5-1.0) work well for short queries, higher values (1.5-2.0) for longer queries. Default: 1.2 (good general purpose value) TYPE: |
b | Length normalization parameter. Set to 0 to disable length normalization, 1 for full normalization. Default: 0.75 (moderate normalization, good for mixed-length documents) TYPE: |
epsilon | Small constant for numerical stability TYPE: |
use_stopwords | Whether to filter common words TYPE: |
stopword_set | Which stopword set to use ('code' for programming, 'english' for natural language) TYPE: |
Attributes¶
loggerinstance-attribute
¶
k1instance-attribute
¶
binstance-attribute
¶
epsiloninstance-attribute
¶
use_stopwordsinstance-attribute
¶
stopword_setinstance-attribute
¶
tokenizerinstance-attribute
¶
document_countinstance-attribute
¶
document_frequencyinstance-attribute
¶
document_lengthsinstance-attribute
¶
document_tokensinstance-attribute
¶
average_doc_lengthinstance-attribute
¶
vocabularyinstance-attribute
¶
idf_cacheinstance-attribute
¶
statsinstance-attribute
¶
Functions¶
tokenize¶
add_document¶
Add a document to the BM25 corpus.
Updates all corpus statistics including document frequency, average document length, and vocabulary.
PARAMETER | DESCRIPTION |
---|---|
doc_id | Unique identifier for the document TYPE: |
text | Document content TYPE: |
Note
Adding documents invalidates the IDF and score caches. For bulk loading, use build_corpus() instead.
build_corpus¶
Build BM25 corpus from multiple documents efficiently.
More efficient than repeated add_document() calls as it calculates statistics once at the end.
PARAMETER | DESCRIPTION |
---|---|
documents | List of (doc_id, text) tuples |
Example
documents = [ ... ("file1.py", "import os\nclass FileHandler"), ... ("file2.py", "from pathlib import Path") ... ] bm25.build_corpus(documents)
compute_idf¶
Compute IDF (Inverse Document Frequency) for a term.
Uses the standard BM25 IDF formula with smoothing to handle edge cases and prevent negative values.
Formula
IDF(term) = log[(N - df + 0.5) / (df + 0.5) + 1]
PARAMETER | DESCRIPTION |
---|---|
term | Term to compute IDF for TYPE: |
RETURNS | DESCRIPTION |
---|---|
float | IDF value (always positive due to +1 in formula) |
score_document¶
Calculate BM25 score for a document given query tokens.
Implements the full BM25 scoring formula with term saturation and length normalization.
PARAMETER | DESCRIPTION |
---|---|
query_tokens | Tokenized query terms |
doc_id | Document identifier to score TYPE: |
explain | If True, return detailed scoring breakdown TYPE: |
RETURNS | DESCRIPTION |
---|---|
float | BM25 score (higher is more relevant) |
float | If explain=True, returns tuple of (score, explanation_dict) |
get_scores¶
get_top_k¶
compute_similarity¶
explain_score¶
TFIDFCalculator¶
TF-IDF calculator for ranking.
Simplified wrapper around NLP TFIDFCalculator to maintain existing ranking API while using centralized logic.
Initialize TF-IDF calculator.
PARAMETER | DESCRIPTION |
---|---|
use_stopwords | Whether to filter stopwords (uses 'code' set) TYPE: |
FactorWeight¶
Bases: Enum
Standard weight presets for ranking factors.
These presets provide balanced weights for different use cases. Can be overridden with custom weights in configuration.
Attributes¶
KEYWORD_MATCHclass-attribute
instance-attribute
¶
TFIDF_SIMILARITYclass-attribute
instance-attribute
¶
BM25_SCOREclass-attribute
instance-attribute
¶
PATH_RELEVANCEclass-attribute
instance-attribute
¶
IMPORT_CENTRALITYclass-attribute
instance-attribute
¶
GIT_RECENCYclass-attribute
instance-attribute
¶
GIT_FREQUENCYclass-attribute
instance-attribute
¶
COMPLEXITY_RELEVANCEclass-attribute
instance-attribute
¶
SEMANTIC_SIMILARITYclass-attribute
instance-attribute
¶
TYPE_RELEVANCEclass-attribute
instance-attribute
¶
CODE_PATTERNSclass-attribute
instance-attribute
¶
AST_RELEVANCEclass-attribute
instance-attribute
¶
DEPENDENCY_DEPTHclass-attribute
instance-attribute
¶
RankedFiledataclass
¶
RankedFile(analysis: FileAnalysis, score: float, factors: RankingFactors, explanation: str = '', confidence: float = 1.0, rank: Optional[int] = None, metadata: Dict[str, Any] = dict())
A file with its relevance ranking.
Combines a FileAnalysis with ranking scores and metadata. Provides utilities for comparison, explanation generation, and result formatting.
ATTRIBUTE | DESCRIPTION |
---|---|
analysis | The FileAnalysis object TYPE: |
score | Overall relevance score (0-1) TYPE: |
factors | Detailed ranking factors TYPE: |
explanation | Human-readable ranking explanation TYPE: |
confidence | Confidence in the ranking (0-1) TYPE: |
rank | Position in ranked list (1-based) |
metadata | Additional ranking metadata |
RankingExplainer¶
Utility class for generating ranking explanations.
Provides detailed explanations of why files ranked the way they did, useful for debugging and understanding ranking behavior.
Initialize the explainer.
Attributes¶
loggerinstance-attribute
¶
Functions¶
explain_ranking¶
explain_ranking(ranked_files: List[RankedFile], weights: Dict[str, float], top_n: int = 10, include_factors: bool = True) -> str
Generate comprehensive ranking explanation.
PARAMETER | DESCRIPTION |
---|---|
ranked_files | List of ranked files TYPE: |
weights | Factor weights used |
top_n | Number of top files to explain TYPE: |
include_factors | Include factor breakdown TYPE: |
RETURNS | DESCRIPTION |
---|---|
str | Formatted explanation string |
compare_rankings¶
compare_rankings(rankings1: List[RankedFile], rankings2: List[RankedFile], labels: Tuple[str, str] = ('Ranking 1', 'Ranking 2')) -> str
Compare two different rankings.
Useful for understanding how different algorithms or weights affect ranking results.
PARAMETER | DESCRIPTION |
---|---|
rankings1 | First ranking TYPE: |
rankings2 | Second ranking TYPE: |
labels | Labels for the two rankings |
RETURNS | DESCRIPTION |
---|---|
str | Comparison report |
RankingFactorsdataclass
¶
RankingFactors(keyword_match: float = 0.0, tfidf_similarity: float = 0.0, bm25_score: float = 0.0, path_relevance: float = 0.0, import_centrality: float = 0.0, dependency_depth: float = 0.0, git_recency: float = 0.0, git_frequency: float = 0.0, git_author_relevance: float = 0.0, complexity_relevance: float = 0.0, maintainability_score: float = 0.0, semantic_similarity: float = 0.0, type_relevance: float = 0.0, code_patterns: float = 0.0, ast_relevance: float = 0.0, test_coverage: float = 0.0, documentation_score: float = 0.0, custom_scores: Dict[str, float] = dict(), metadata: Dict[str, Any] = dict())
Comprehensive ranking factors for a file.
Each factor represents a different dimension of relevance. The final relevance score is computed as a weighted sum of these factors.
Factors are grouped into categories: - Text-based: keyword_match, tfidf_similarity, bm25_score - Structure-based: path_relevance, import_centrality, dependency_depth - Git-based: git_recency, git_frequency, git_author_relevance - Complexity-based: complexity_relevance, maintainability_score - Semantic: semantic_similarity (requires ML) - Pattern-based: code_patterns, ast_relevance - Custom: custom_scores for project-specific factors
ATTRIBUTE | DESCRIPTION |
---|---|
keyword_match | Direct keyword matching score (0-1) TYPE: |
tfidf_similarity | TF-IDF cosine similarity score (0-1) TYPE: |
bm25_score | BM25 relevance score (0-1) TYPE: |
path_relevance | File path relevance to query (0-1) TYPE: |
import_centrality | How central file is in import graph (0-1) TYPE: |
git_recency | How recently file was modified (0-1) TYPE: |
git_frequency | How frequently file changes (0-1) TYPE: |
git_author_relevance | Relevance based on commit authors (0-1) TYPE: |
complexity_relevance | Relevance based on code complexity (0-1) TYPE: |
maintainability_score | Code maintainability score (0-1) TYPE: |
semantic_similarity | ML-based semantic similarity (0-1) TYPE: |
type_relevance | Relevance based on file type (0-1) TYPE: |
code_patterns | Pattern matching score (0-1) TYPE: |
ast_relevance | AST structure relevance (0-1) TYPE: |
dependency_depth | Dependency tree depth score (0-1) TYPE: |
test_coverage | Test coverage relevance (0-1) TYPE: |
documentation_score | Documentation quality score (0-1) TYPE: |
custom_scores | Dictionary of custom factor scores |
metadata | Additional metadata about factor calculation |
Attributes¶
keyword_matchclass-attribute
instance-attribute
¶
tfidf_similarityclass-attribute
instance-attribute
¶
bm25_scoreclass-attribute
instance-attribute
¶
path_relevanceclass-attribute
instance-attribute
¶
import_centralityclass-attribute
instance-attribute
¶
dependency_depthclass-attribute
instance-attribute
¶
git_recencyclass-attribute
instance-attribute
¶
git_frequencyclass-attribute
instance-attribute
¶
git_author_relevanceclass-attribute
instance-attribute
¶
complexity_relevanceclass-attribute
instance-attribute
¶
maintainability_scoreclass-attribute
instance-attribute
¶
semantic_similarityclass-attribute
instance-attribute
¶
type_relevanceclass-attribute
instance-attribute
¶
code_patternsclass-attribute
instance-attribute
¶
ast_relevanceclass-attribute
instance-attribute
¶
documentation_scoreclass-attribute
instance-attribute
¶
custom_scoresclass-attribute
instance-attribute
¶
metadataclass-attribute
instance-attribute
¶
Functions¶
get_weighted_score¶
get_top_factors¶
RankingAlgorithm¶
Bases: Enum
Available ranking algorithms.
Each algorithm provides different trade-offs between speed and accuracy.
RankingStatsdataclass
¶
RankingStats(total_files: int = 0, files_ranked: int = 0, files_failed: int = 0, time_elapsed: float = 0.0, algorithm_used: str = '', threshold_applied: float = 0.0, files_above_threshold: int = 0, average_score: float = 0.0, max_score: float = 0.0, min_score: float = 0.0, corpus_stats: Dict[str, Any] = None)
Statistics from ranking operation.
Tracks performance metrics and diagnostic information about the ranking process for monitoring and optimization.
ATTRIBUTE | DESCRIPTION |
---|---|
total_files | Total number of files processed TYPE: |
files_ranked | Number of files successfully ranked TYPE: |
files_failed | Number of files that failed ranking TYPE: |
time_elapsed | Total time in seconds TYPE: |
algorithm_used | Which algorithm was used TYPE: |
threshold_applied | Relevance threshold used TYPE: |
files_above_threshold | Number of files above threshold TYPE: |
average_score | Average relevance score TYPE: |
max_score | Maximum relevance score TYPE: |
min_score | Minimum relevance score TYPE: |
corpus_stats | Dictionary of corpus statistics |
Attributes¶
total_filesclass-attribute
instance-attribute
¶
files_rankedclass-attribute
instance-attribute
¶
files_failedclass-attribute
instance-attribute
¶
time_elapsedclass-attribute
instance-attribute
¶
algorithm_usedclass-attribute
instance-attribute
¶
threshold_appliedclass-attribute
instance-attribute
¶
files_above_thresholdclass-attribute
instance-attribute
¶
average_scoreclass-attribute
instance-attribute
¶
max_scoreclass-attribute
instance-attribute
¶
min_scoreclass-attribute
instance-attribute
¶
corpus_statsclass-attribute
instance-attribute
¶
Functions¶
RelevanceRanker¶
RelevanceRanker(config: TenetsConfig, algorithm: Optional[str] = None, use_stopwords: Optional[bool] = None)
Main relevance ranking system.
Orchestrates the ranking process by analyzing the corpus, selecting appropriate strategies, and producing ranked results. Supports multiple algorithms, parallel processing, and custom ranking extensions.
The ranker follows a multi-stage process: 1. Corpus analysis (TF-IDF, import graph, statistics) 2. Strategy selection based on algorithm 3. Parallel factor calculation 4. Score aggregation and weighting 5. Filtering and sorting
ATTRIBUTE | DESCRIPTION |
---|---|
config | TenetsConfig instance |
logger | Logger instance |
strategies | Available ranking strategies |
custom_rankers | Custom ranking functions |
executor | Thread pool for parallel processing |
stats | Latest ranking statistics |
cache | Internal cache for optimizations |
Initialize the relevance ranker.
PARAMETER | DESCRIPTION |
---|---|
config | Tenets configuration TYPE: |
algorithm | Override default algorithm |
use_stopwords | Override stopword filtering setting |
Attributes¶
configinstance-attribute
¶
loggerinstance-attribute
¶
algorithminstance-attribute
¶
use_stopwordsinstance-attribute
¶
use_mlinstance-attribute
¶
strategiesinstance-attribute
¶
custom_rankersinstance-attribute
¶
max_workersinstance-attribute
¶
statsinstance-attribute
¶
cacheinstance-attribute
¶
SentenceTransformerinstance-attribute
¶
executorproperty
¶
Lazy initialization of ThreadPoolExecutor to avoid Windows import issues.
Functions¶
rank_files¶
rank_files(files: List[FileAnalysis], prompt_context: PromptContext, algorithm: Optional[str] = None, parallel: bool = True, explain: bool = False) -> List[FileAnalysis]
Rank files by relevance to prompt.
This is the main entry point for ranking files. It analyzes the corpus, applies the selected ranking strategy, and returns files sorted by relevance above the configured threshold.
PARAMETER | DESCRIPTION |
---|---|
files | List of files to rank TYPE: |
prompt_context | Parsed prompt information TYPE: |
algorithm | Override algorithm for this ranking |
parallel | Whether to rank files in parallel TYPE: |
explain | Whether to generate ranking explanations TYPE: |
RETURNS | DESCRIPTION |
---|---|
List[FileAnalysis] | List of FileAnalysis objects sorted by relevance (highest first) |
List[FileAnalysis] | and filtered by threshold |
RAISES | DESCRIPTION |
---|---|
ValueError | If algorithm is invalid |
register_custom_ranker¶
register_custom_ranker(ranker_func: Callable[[List[RankedFile], PromptContext], List[RankedFile]])
Register a custom ranking function.
Custom rankers are applied after the main ranking strategy and can adjust scores based on project-specific logic.
PARAMETER | DESCRIPTION |
---|---|
ranker_func | Function that takes ranked files and returns modified list TYPE: |
Example
def boost_tests(ranked_files, prompt_context): ... if 'test' in prompt_context.text: ... for rf in ranked_files: ... if 'test' in rf.path: ... rf.score *= 1.5 ... return ranked_files ranker.register_custom_ranker(boost_tests)
get_ranking_explanation¶
Get detailed explanation of ranking results.
PARAMETER | DESCRIPTION |
---|---|
ranked_files | List of ranked files TYPE: |
top_n | Number of top files to explain TYPE: |
RETURNS | DESCRIPTION |
---|---|
str | Formatted explanation string |
get_stats¶
Get latest ranking statistics.
RETURNS | DESCRIPTION |
---|---|
RankingStats | RankingStats object |
BalancedRankingStrategy¶
Bases: RankingStrategy
Balanced multi-factor ranking strategy.
Initialize balanced ranking strategy.
FastRankingStrategy¶
Bases: RankingStrategy
Fast keyword-based ranking strategy.
Initialize fast ranking strategy.
MLRankingStrategy¶
Bases: RankingStrategy
Machine Learning-based ranking strategy.
Initialize ML ranking strategy.
Attributes¶
nameclass-attribute
instance-attribute
¶
descriptionclass-attribute
instance-attribute
¶
loggerinstance-attribute
¶
Functions¶
rank_file¶
rank_file(file: FileAnalysis, prompt_context: PromptContext, corpus_stats: Dict[str, Any]) -> RankingFactors
ML-based ranking with semantic similarity.
RankingStrategy¶
Bases: ABC
Abstract base class for ranking strategies.
ThoroughRankingStrategy¶
Bases: RankingStrategy
Thorough deep analysis ranking strategy using centralized NLP.
Initialize thorough ranking strategy with NLP components.
Attributes¶
nameclass-attribute
instance-attribute
¶
descriptionclass-attribute
instance-attribute
¶
loggerinstance-attribute
¶
programming_patternsinstance-attribute
¶
Functions¶
rank_file¶
rank_file(file: FileAnalysis, prompt_context: PromptContext, corpus_stats: Dict[str, Any]) -> RankingFactors
Thorough ranking with deep analysis using centralized NLP.
Functions¶
create_ranker¶
create_ranker(config: Optional[TenetsConfig] = None, algorithm: str = 'balanced', use_stopwords: bool = False) -> RelevanceRanker
Create a configured relevance ranker.
PARAMETER | DESCRIPTION |
---|---|
config | Configuration (uses default if None) TYPE: |
algorithm | Ranking algorithm to use TYPE: |
use_stopwords | Whether to filter stopwords TYPE: |
RETURNS | DESCRIPTION |
---|---|
RelevanceRanker | Configured RelevanceRanker instance |
get_default_ranker¶
Get a default configured ranker.
Convenience function to quickly get a working ranker with sensible defaults.
PARAMETER | DESCRIPTION |
---|---|
config | Optional configuration override TYPE: |
RETURNS | DESCRIPTION |
---|---|
RelevanceRanker | Configured RelevanceRanker instance |
rank_files_simple¶
rank_files_simple(files: List, prompt: str, algorithm: str = 'balanced', threshold: float = 0.1) -> List
Simple interface for ranking files.
Provides a simplified API for quick ranking without needing to manage ranker instances or configurations.
PARAMETER | DESCRIPTION |
---|---|
files | List of FileAnalysis objects TYPE: |
prompt | Search prompt or query TYPE: |
algorithm | Ranking algorithm to use TYPE: |
threshold | Minimum relevance score TYPE: |
RETURNS | DESCRIPTION |
---|---|
List | List of files sorted by relevance above threshold |
Example
from tenets.core.ranking import rank_files_simple relevant_files = rank_files_simple( ... files, ... "authentication logic", ... algorithm="thorough" ... )
explain_ranking¶
explain_ranking(files: List, prompt: str, algorithm: str = 'balanced', top_n: int = 10) -> str
Get explanation of why files ranked the way they did.
Useful for debugging and understanding ranking behavior.
PARAMETER | DESCRIPTION |
---|---|
files | List of FileAnalysis objects TYPE: |
prompt | Search prompt TYPE: |
algorithm | Algorithm used TYPE: |
top_n | Number of top files to explain TYPE: |
RETURNS | DESCRIPTION |
---|---|
str | Formatted explanation string |
Example
from tenets.core.ranking import explain_ranking explanation = explain_ranking(files, "database models") print(explanation)
get_default_tfidf¶
Get default TF-IDF calculator instance.
PARAMETER | DESCRIPTION |
---|---|
use_stopwords | Whether to filter stopwords TYPE: |
RETURNS | DESCRIPTION |
---|---|
TFIDFCalculator | TFIDFCalculator instance |
get_default_bm25¶
Get default BM25 calculator instance.
PARAMETER | DESCRIPTION |
---|---|
use_stopwords | Whether to filter stopwords TYPE: |
RETURNS | DESCRIPTION |
---|---|
BM25Calculator | BM25Calculator instance |
Modules¶
factors
- Factors moduleranker
- Ranker modulestrategies
- Strategies module