Skip to content

tenets.core.ranking Package

Relevance ranking system for Tenets.

This package provides sophisticated file ranking capabilities using multiple strategies from simple keyword matching to advanced ML-based semantic analysis. The ranking system is designed to efficiently identify the most relevant files for a given prompt or query.

Main components: - RelevanceRanker: Main orchestrator for ranking operations - RankingFactors: Comprehensive factors used for scoring - RankedFile: File with ranking information - Ranking strategies: Fast, Balanced, Thorough, ML - TF-IDF and BM25 calculators for text similarity

Example usage

from tenets.core.ranking import RelevanceRanker, create_ranker from tenets.models.context import PromptContext

Create ranker with config

ranker = create_ranker(algorithm="balanced")

Parse prompt

prompt_context = PromptContext(text="implement OAuth authentication")

Rank files

ranked_files = ranker.rank_files(files, prompt_context)

Get top relevant files

for file in ranked_files[:10]: ... print(f"{file.path}: {file.relevance_score:.3f}")

Attributes

ML_AVAILABLEmodule-attribute

Python
ML_AVAILABLE = True

Classes

BM25Calculator

Python
BM25Calculator(k1: float = 1.2, b: float = 0.75, epsilon: float = 0.25, use_stopwords: bool = False, stopword_set: str = 'code')

BM25 ranking algorithm with advanced features for code search.

This implementation provides
  • Configurable term saturation (k1) and length normalization (b)
  • Efficient tokenization with optional stopword filtering
  • IDF caching for performance
  • Support for incremental corpus updates
  • Query expansion capabilities
  • Detailed scoring explanations for debugging
ATTRIBUTEDESCRIPTION
k1

Controls term frequency saturation. Higher values mean less saturation (more weight to term frequency). Typical range: 0.5-2.0, default: 1.2

TYPE:float

b

Controls document length normalization. 0 = no normalization, 1 = full normalization. Typical range: 0.5-0.8, default: 0.75

TYPE:float

epsilon

Small constant to prevent division by zero

TYPE:float

Initialize BM25 calculator with configurable parameters.

PARAMETERDESCRIPTION
k1

Term frequency saturation parameter. Lower values (0.5-1.0) work well for short queries, higher values (1.5-2.0) for longer queries. Default: 1.2 (good general purpose value)

TYPE:floatDEFAULT:1.2

b

Length normalization parameter. Set to 0 to disable length normalization, 1 for full normalization. Default: 0.75 (moderate normalization, good for mixed-length documents)

TYPE:floatDEFAULT:0.75

epsilon

Small constant for numerical stability

TYPE:floatDEFAULT:0.25

use_stopwords

Whether to filter common words

TYPE:boolDEFAULT:False

stopword_set

Which stopword set to use ('code' for programming, 'english' for natural language)

TYPE:strDEFAULT:'code'

Attributes

loggerinstance-attribute
Python
logger = get_logger(__name__)
k1instance-attribute
Python
k1 = k1
binstance-attribute
Python
b = b
epsiloninstance-attribute
Python
epsilon = epsilon
use_stopwordsinstance-attribute
Python
use_stopwords = use_stopwords
stopword_setinstance-attribute
Python
stopword_set = stopword_set
tokenizerinstance-attribute
Python
tokenizer = CodeTokenizer(use_stopwords=use_stopwords)
document_countinstance-attribute
Python
document_count = 0
document_frequencyinstance-attribute
Python
document_frequency: Dict[str, int] = defaultdict(int)
document_lengthsinstance-attribute
Python
document_lengths: Dict[str, int] = {}
document_tokensinstance-attribute
Python
document_tokens: Dict[str, List[str]] = {}
average_doc_lengthinstance-attribute
Python
average_doc_length = 0.0
vocabularyinstance-attribute
Python
vocabulary: Set[str] = set()
idf_cacheinstance-attribute
Python
idf_cache: Dict[str, float] = {}
statsinstance-attribute
Python
stats = {'queries_processed': 0, 'cache_hits': 0, 'cache_misses': 0, 'documents_added': 0}

Functions

tokenize
Python
tokenize(text: str) -> List[str]

Tokenize text using code-aware tokenizer.

Handles various code constructs
  • CamelCase and snake_case splitting
  • Preservation of important symbols
  • Number and identifier extraction
PARAMETERDESCRIPTION
text

Input text to tokenize

TYPE:str

RETURNSDESCRIPTION
List[str]

List of tokens, lowercased and filtered

add_document
Python
add_document(doc_id: str, text: str) -> None

Add a document to the BM25 corpus.

Updates all corpus statistics including document frequency, average document length, and vocabulary.

PARAMETERDESCRIPTION
doc_id

Unique identifier for the document

TYPE:str

text

Document content

TYPE:str

Note

Adding documents invalidates the IDF and score caches. For bulk loading, use build_corpus() instead.

build_corpus
Python
build_corpus(documents: List[Tuple[str, str]]) -> None

Build BM25 corpus from multiple documents efficiently.

More efficient than repeated add_document() calls as it calculates statistics once at the end.

PARAMETERDESCRIPTION
documents

List of (doc_id, text) tuples

TYPE:List[Tuple[str, str]]

Example

documents = [ ... ("file1.py", "import os\nclass FileHandler"), ... ("file2.py", "from pathlib import Path") ... ] bm25.build_corpus(documents)

compute_idf
Python
compute_idf(term: str) -> float

Compute IDF (Inverse Document Frequency) for a term.

Uses the standard BM25 IDF formula with smoothing to handle edge cases and prevent negative values.

Formula

IDF(term) = log[(N - df + 0.5) / (df + 0.5) + 1]

PARAMETERDESCRIPTION
term

Term to compute IDF for

TYPE:str

RETURNSDESCRIPTION
float

IDF value (always positive due to +1 in formula)

score_document
Python
score_document(query_tokens: List[str], doc_id: str, explain: bool = False) -> float

Calculate BM25 score for a document given query tokens.

Implements the full BM25 scoring formula with term saturation and length normalization.

PARAMETERDESCRIPTION
query_tokens

Tokenized query terms

TYPE:List[str]

doc_id

Document identifier to score

TYPE:str

explain

If True, return detailed scoring breakdown

TYPE:boolDEFAULT:False

RETURNSDESCRIPTION
float

BM25 score (higher is more relevant)

float

If explain=True, returns tuple of (score, explanation_dict)

get_scores
Python
get_scores(query: str, doc_ids: Optional[List[str]] = None) -> List[Tuple[str, float]]

Get BM25 scores for all documents or a subset.

PARAMETERDESCRIPTION
query

Search query string

TYPE:str

doc_ids

Optional list of document IDs to score. If None, scores all documents.

TYPE:Optional[List[str]]DEFAULT:None

RETURNSDESCRIPTION
List[Tuple[str, float]]

List of (doc_id, score) tuples sorted by score (descending)

get_top_k
Python
get_top_k(query: str, k: int = 10, threshold: float = 0.0) -> List[Tuple[str, float]]

Get top-k documents by BM25 score.

PARAMETERDESCRIPTION
query

Search query

TYPE:str

k

Number of top documents to return

TYPE:intDEFAULT:10

threshold

Minimum score threshold (documents below are filtered)

TYPE:floatDEFAULT:0.0

RETURNSDESCRIPTION
List[Tuple[str, float]]

List of top-k (doc_id, score) tuples

compute_similarity
Python
compute_similarity(query: str, doc_id: str) -> float

Compute normalized similarity score between query and document.

Returns a value between 0 and 1 for consistency with other similarity measures.

PARAMETERDESCRIPTION
query

Query text

TYPE:str

doc_id

Document identifier

TYPE:str

RETURNSDESCRIPTION
float

Normalized similarity score (0-1)

explain_score
Python
explain_score(query: str, doc_id: str) -> Dict

Get detailed explanation of BM25 scoring for debugging.

PARAMETERDESCRIPTION
query

Query text

TYPE:str

doc_id

Document to explain scoring for

TYPE:str

RETURNSDESCRIPTION
Dict

Dictionary with detailed scoring breakdown

get_stats
Python
get_stats() -> Dict

Get calculator statistics for monitoring.

RETURNSDESCRIPTION
Dict

Dictionary with usage statistics

clear_cache
Python
clear_cache() -> None

Clear all caches to free memory.

TFIDFCalculator

Python
TFIDFCalculator(use_stopwords: bool = False)

TF-IDF calculator for ranking.

Simplified wrapper around NLP TFIDFCalculator to maintain existing ranking API while using centralized logic.

Initialize TF-IDF calculator.

PARAMETERDESCRIPTION
use_stopwords

Whether to filter stopwords (uses 'code' set)

TYPE:boolDEFAULT:False

Attributes

loggerinstance-attribute
Python
logger = get_logger(__name__)
use_stopwordsinstance-attribute
Python
use_stopwords = use_stopwords
stopwordsinstance-attribute
Python
stopwords: Set[str] = set(words) if sw else set()
document_vectorsproperty
Python
document_vectors: Dict[str, Dict[str, float]]

Get document vectors.

document_normsproperty
Python
document_norms: Dict[str, float]

Get document vector norms.

vocabularyproperty
Python
vocabulary: set

Get vocabulary.

document_countpropertywritable
Python
document_count: int
document_frequencypropertywritable
Python
document_frequency: Dict[str, int]
idf_cachepropertywritable
Python
idf_cache: Dict[str, float]

Functions

tokenize
Python
tokenize(text: str) -> List[str]

Tokenize text using NLP tokenizer.

PARAMETERDESCRIPTION
text

Input text

TYPE:str

RETURNSDESCRIPTION
List[str]

List of tokens

add_document
Python
add_document(doc_id: str, text: str) -> Dict[str, float]

Add document to corpus.

PARAMETERDESCRIPTION
doc_id

Document identifier

TYPE:str

text

Document content

TYPE:str

RETURNSDESCRIPTION
Dict[str, float]

TF-IDF vector for document

compute_tf
Python
compute_tf(tokens: List[str], use_sublinear: bool = True) -> Dict[str, float]
compute_idf
Python
compute_idf(term: str) -> float
compute_similarity
Python
compute_similarity(query_text: str, doc_id: str) -> float

Compute similarity between query and document.

PARAMETERDESCRIPTION
query_text

Query text

TYPE:str

doc_id

Document identifier

TYPE:str

RETURNSDESCRIPTION
float

Cosine similarity score (0-1)

get_top_terms
Python
get_top_terms(doc_id: str, n: int = 10) -> List[Tuple[str, float]]

Return the top-n TF-IDF terms for a given document.

PARAMETERDESCRIPTION
doc_id

Document identifier

TYPE:str

n

Maximum number of terms to return

TYPE:intDEFAULT:10

RETURNSDESCRIPTION
List[Tuple[str, float]]

List of (term, score) sorted by score descending

build_corpus
Python
build_corpus(documents: List[Tuple[str, str]]) -> None

Build corpus from documents.

PARAMETERDESCRIPTION
documents

List of (doc_id, text) tuples

TYPE:List[Tuple[str, str]]

FactorWeight

Bases: Enum

Standard weight presets for ranking factors.

These presets provide balanced weights for different use cases. Can be overridden with custom weights in configuration.

Attributes

KEYWORD_MATCHclass-attributeinstance-attribute
Python
KEYWORD_MATCH = 0.25
TFIDF_SIMILARITYclass-attributeinstance-attribute
Python
TFIDF_SIMILARITY = 0.2
BM25_SCOREclass-attributeinstance-attribute
Python
BM25_SCORE = 0.15
PATH_RELEVANCEclass-attributeinstance-attribute
Python
PATH_RELEVANCE = 0.15
IMPORT_CENTRALITYclass-attributeinstance-attribute
Python
IMPORT_CENTRALITY = 0.1
GIT_RECENCYclass-attributeinstance-attribute
Python
GIT_RECENCY = 0.05
GIT_FREQUENCYclass-attributeinstance-attribute
Python
GIT_FREQUENCY = 0.05
COMPLEXITY_RELEVANCEclass-attributeinstance-attribute
Python
COMPLEXITY_RELEVANCE = 0.05
SEMANTIC_SIMILARITYclass-attributeinstance-attribute
Python
SEMANTIC_SIMILARITY = 0.25
TYPE_RELEVANCEclass-attributeinstance-attribute
Python
TYPE_RELEVANCE = 0.1
CODE_PATTERNSclass-attributeinstance-attribute
Python
CODE_PATTERNS = 0.1
AST_RELEVANCEclass-attributeinstance-attribute
Python
AST_RELEVANCE = 0.1
DEPENDENCY_DEPTHclass-attributeinstance-attribute
Python
DEPENDENCY_DEPTH = 0.05

RankedFiledataclass

Python
RankedFile(analysis: FileAnalysis, score: float, factors: RankingFactors, explanation: str = '', confidence: float = 1.0, rank: Optional[int] = None, metadata: Dict[str, Any] = dict())

A file with its relevance ranking.

Combines a FileAnalysis with ranking scores and metadata. Provides utilities for comparison, explanation generation, and result formatting.

ATTRIBUTEDESCRIPTION
analysis

The FileAnalysis object

TYPE:FileAnalysis

score

Overall relevance score (0-1)

TYPE:float

factors

Detailed ranking factors

TYPE:RankingFactors

explanation

Human-readable ranking explanation

TYPE:str

confidence

Confidence in the ranking (0-1)

TYPE:float

rank

Position in ranked list (1-based)

TYPE:Optional[int]

metadata

Additional ranking metadata

TYPE:Dict[str, Any]

Attributes

analysisinstance-attribute
Python
analysis: FileAnalysis
scoreinstance-attribute
Python
score: float
factorsinstance-attribute
Python
factors: RankingFactors
explanationclass-attributeinstance-attribute
Python
explanation: str = ''
confidenceclass-attributeinstance-attribute
Python
confidence: float = 1.0
rankclass-attributeinstance-attribute
Python
rank: Optional[int] = None
metadataclass-attributeinstance-attribute
Python
metadata: Dict[str, Any] = field(default_factory=dict)
pathproperty
Python
path: str

Get file path.

file_nameproperty
Python
file_name: str

Get file name.

languageproperty
Python
language: str

Get file language.

Functions

generate_explanation
Python
generate_explanation(weights: Dict[str, float], verbose: bool = False) -> str

Generate human-readable explanation of ranking.

PARAMETERDESCRIPTION
weights

Factor weights used for ranking

TYPE:Dict[str, float]

verbose

Include detailed factor breakdown

TYPE:boolDEFAULT:False

RETURNSDESCRIPTION
str

Explanation string

to_dict
Python
to_dict() -> Dict[str, Any]

Convert to dictionary representation.

RETURNSDESCRIPTION
Dict[str, Any]

Dictionary with all ranking information

RankingExplainer

Python
RankingExplainer()

Utility class for generating ranking explanations.

Provides detailed explanations of why files ranked the way they did, useful for debugging and understanding ranking behavior.

Initialize the explainer.

Attributes

loggerinstance-attribute
Python
logger = get_logger(__name__)

Functions

explain_ranking
Python
explain_ranking(ranked_files: List[RankedFile], weights: Dict[str, float], top_n: int = 10, include_factors: bool = True) -> str

Generate comprehensive ranking explanation.

PARAMETERDESCRIPTION
ranked_files

List of ranked files

TYPE:List[RankedFile]

weights

Factor weights used

TYPE:Dict[str, float]

top_n

Number of top files to explain

TYPE:intDEFAULT:10

include_factors

Include factor breakdown

TYPE:boolDEFAULT:True

RETURNSDESCRIPTION
str

Formatted explanation string

compare_rankings
Python
compare_rankings(rankings1: List[RankedFile], rankings2: List[RankedFile], labels: Tuple[str, str] = ('Ranking 1', 'Ranking 2')) -> str

Compare two different rankings.

Useful for understanding how different algorithms or weights affect ranking results.

PARAMETERDESCRIPTION
rankings1

First ranking

TYPE:List[RankedFile]

rankings2

Second ranking

TYPE:List[RankedFile]

labels

Labels for the two rankings

TYPE:Tuple[str, str]DEFAULT:('Ranking 1', 'Ranking 2')

RETURNSDESCRIPTION
str

Comparison report

RankingFactorsdataclass

Python
RankingFactors(keyword_match: float = 0.0, tfidf_similarity: float = 0.0, bm25_score: float = 0.0, path_relevance: float = 0.0, import_centrality: float = 0.0, dependency_depth: float = 0.0, git_recency: float = 0.0, git_frequency: float = 0.0, git_author_relevance: float = 0.0, complexity_relevance: float = 0.0, maintainability_score: float = 0.0, semantic_similarity: float = 0.0, type_relevance: float = 0.0, code_patterns: float = 0.0, ast_relevance: float = 0.0, test_coverage: float = 0.0, documentation_score: float = 0.0, custom_scores: Dict[str, float] = dict(), metadata: Dict[str, Any] = dict())

Comprehensive ranking factors for a file.

Each factor represents a different dimension of relevance. The final relevance score is computed as a weighted sum of these factors.

Factors are grouped into categories: - Text-based: keyword_match, tfidf_similarity, bm25_score - Structure-based: path_relevance, import_centrality, dependency_depth - Git-based: git_recency, git_frequency, git_author_relevance - Complexity-based: complexity_relevance, maintainability_score - Semantic: semantic_similarity (requires ML) - Pattern-based: code_patterns, ast_relevance - Custom: custom_scores for project-specific factors

ATTRIBUTEDESCRIPTION
keyword_match

Direct keyword matching score (0-1)

TYPE:float

tfidf_similarity

TF-IDF cosine similarity score (0-1)

TYPE:float

bm25_score

BM25 relevance score (0-1)

TYPE:float

path_relevance

File path relevance to query (0-1)

TYPE:float

import_centrality

How central file is in import graph (0-1)

TYPE:float

git_recency

How recently file was modified (0-1)

TYPE:float

git_frequency

How frequently file changes (0-1)

TYPE:float

git_author_relevance

Relevance based on commit authors (0-1)

TYPE:float

complexity_relevance

Relevance based on code complexity (0-1)

TYPE:float

maintainability_score

Code maintainability score (0-1)

TYPE:float

semantic_similarity

ML-based semantic similarity (0-1)

TYPE:float

type_relevance

Relevance based on file type (0-1)

TYPE:float

code_patterns

Pattern matching score (0-1)

TYPE:float

ast_relevance

AST structure relevance (0-1)

TYPE:float

dependency_depth

Dependency tree depth score (0-1)

TYPE:float

test_coverage

Test coverage relevance (0-1)

TYPE:float

documentation_score

Documentation quality score (0-1)

TYPE:float

custom_scores

Dictionary of custom factor scores

TYPE:Dict[str, float]

metadata

Additional metadata about factor calculation

TYPE:Dict[str, Any]

Attributes

keyword_matchclass-attributeinstance-attribute
Python
keyword_match: float = 0.0
tfidf_similarityclass-attributeinstance-attribute
Python
tfidf_similarity: float = 0.0
bm25_scoreclass-attributeinstance-attribute
Python
bm25_score: float = 0.0
path_relevanceclass-attributeinstance-attribute
Python
path_relevance: float = 0.0
import_centralityclass-attributeinstance-attribute
Python
import_centrality: float = 0.0
dependency_depthclass-attributeinstance-attribute
Python
dependency_depth: float = 0.0
git_recencyclass-attributeinstance-attribute
Python
git_recency: float = 0.0
git_frequencyclass-attributeinstance-attribute
Python
git_frequency: float = 0.0
git_author_relevanceclass-attributeinstance-attribute
Python
git_author_relevance: float = 0.0
complexity_relevanceclass-attributeinstance-attribute
Python
complexity_relevance: float = 0.0
maintainability_scoreclass-attributeinstance-attribute
Python
maintainability_score: float = 0.0
semantic_similarityclass-attributeinstance-attribute
Python
semantic_similarity: float = 0.0
type_relevanceclass-attributeinstance-attribute
Python
type_relevance: float = 0.0
code_patternsclass-attributeinstance-attribute
Python
code_patterns: float = 0.0
ast_relevanceclass-attributeinstance-attribute
Python
ast_relevance: float = 0.0
documentation_scoreclass-attributeinstance-attribute
Python
documentation_score: float = 0.0
custom_scoresclass-attributeinstance-attribute
Python
custom_scores: Dict[str, float] = field(default_factory=dict)
metadataclass-attributeinstance-attribute
Python
metadata: Dict[str, Any] = field(default_factory=dict)

Functions

get_weighted_score
Python
get_weighted_score(weights: Dict[str, float], normalize: bool = True) -> float

Calculate weighted relevance score.

PARAMETERDESCRIPTION
weights

Dictionary mapping factor names to weights

TYPE:Dict[str, float]

normalize

Whether to normalize final score to [0, 1]

TYPE:boolDEFAULT:True

RETURNSDESCRIPTION
float

Weighted relevance score

get_top_factors
Python
get_top_factors(weights: Dict[str, float], n: int = 5) -> List[Tuple[str, float, float]]

Get the top contributing factors.

PARAMETERDESCRIPTION
weights

Factor weights

TYPE:Dict[str, float]

n

Number of top factors to return

TYPE:intDEFAULT:5

RETURNSDESCRIPTION
List[Tuple[str, float, float]]

List of (factor_name, value, contribution) tuples

to_dict
Python
to_dict() -> Dict[str, Any]

Convert factors to dictionary representation.

RETURNSDESCRIPTION
Dict[str, Any]

Dictionary with all factor values

RankingAlgorithm

Bases: Enum

Available ranking algorithms.

Each algorithm provides different trade-offs between speed and accuracy.

Attributes

FASTclass-attributeinstance-attribute
Python
FAST = 'fast'
BALANCEDclass-attributeinstance-attribute
Python
BALANCED = 'balanced'
THOROUGHclass-attributeinstance-attribute
Python
THOROUGH = 'thorough'
MLclass-attributeinstance-attribute
Python
ML = 'ml'
CUSTOMclass-attributeinstance-attribute
Python
CUSTOM = 'custom'

RankingStatsdataclass

Python
RankingStats(total_files: int = 0, files_ranked: int = 0, files_failed: int = 0, time_elapsed: float = 0.0, algorithm_used: str = '', threshold_applied: float = 0.0, files_above_threshold: int = 0, average_score: float = 0.0, max_score: float = 0.0, min_score: float = 0.0, corpus_stats: Dict[str, Any] = None)

Statistics from ranking operation.

Tracks performance metrics and diagnostic information about the ranking process for monitoring and optimization.

ATTRIBUTEDESCRIPTION
total_files

Total number of files processed

TYPE:int

files_ranked

Number of files successfully ranked

TYPE:int

files_failed

Number of files that failed ranking

TYPE:int

time_elapsed

Total time in seconds

TYPE:float

algorithm_used

Which algorithm was used

TYPE:str

threshold_applied

Relevance threshold used

TYPE:float

files_above_threshold

Number of files above threshold

TYPE:int

average_score

Average relevance score

TYPE:float

max_score

Maximum relevance score

TYPE:float

min_score

Minimum relevance score

TYPE:float

corpus_stats

Dictionary of corpus statistics

TYPE:Dict[str, Any]

Attributes

total_filesclass-attributeinstance-attribute
Python
total_files: int = 0
files_rankedclass-attributeinstance-attribute
Python
files_ranked: int = 0
files_failedclass-attributeinstance-attribute
Python
files_failed: int = 0
time_elapsedclass-attributeinstance-attribute
Python
time_elapsed: float = 0.0
algorithm_usedclass-attributeinstance-attribute
Python
algorithm_used: str = ''
threshold_appliedclass-attributeinstance-attribute
Python
threshold_applied: float = 0.0
files_above_thresholdclass-attributeinstance-attribute
Python
files_above_threshold: int = 0
average_scoreclass-attributeinstance-attribute
Python
average_score: float = 0.0
max_scoreclass-attributeinstance-attribute
Python
max_score: float = 0.0
min_scoreclass-attributeinstance-attribute
Python
min_score: float = 0.0
corpus_statsclass-attributeinstance-attribute
Python
corpus_stats: Dict[str, Any] = None

Functions

to_dict
Python
to_dict() -> Dict[str, Any]

Convert to dictionary representation.

RETURNSDESCRIPTION
Dict[str, Any]

Dictionary with all statistics

RelevanceRanker

Python
RelevanceRanker(config: TenetsConfig, algorithm: Optional[str] = None, use_stopwords: Optional[bool] = None)

Main relevance ranking system.

Orchestrates the ranking process by analyzing the corpus, selecting appropriate strategies, and producing ranked results. Supports multiple algorithms, parallel processing, and custom ranking extensions.

The ranker follows a multi-stage process: 1. Corpus analysis (TF-IDF, import graph, statistics) 2. Strategy selection based on algorithm 3. Parallel factor calculation 4. Score aggregation and weighting 5. Filtering and sorting

ATTRIBUTEDESCRIPTION
config

TenetsConfig instance

logger

Logger instance

strategies

Available ranking strategies

custom_rankers

Custom ranking functions

TYPE:List[Callable]

executor

Thread pool for parallel processing

stats

Latest ranking statistics

cache

Internal cache for optimizations

Initialize the relevance ranker.

PARAMETERDESCRIPTION
config

Tenets configuration

TYPE:TenetsConfig

algorithm

Override default algorithm

TYPE:Optional[str]DEFAULT:None

use_stopwords

Override stopword filtering setting

TYPE:Optional[bool]DEFAULT:None

Attributes

configinstance-attribute
Python
config = config
loggerinstance-attribute
Python
logger = get_logger(__name__)
algorithminstance-attribute
Python
algorithm = RankingAlgorithm(algo_str)
use_stopwordsinstance-attribute
Python
use_stopwords = use_stopwords if use_stopwords is not None else use_stopwords
use_mlinstance-attribute
Python
use_ml = use_ml if config and hasattr(ranking, 'use_ml') else False
strategiesinstance-attribute
Python
strategies = _strategies_cache
custom_rankersinstance-attribute
Python
custom_rankers: List[Callable] = []
max_workersinstance-attribute
Python
max_workers = max_workers
statsinstance-attribute
Python
stats = RankingStats()
cacheinstance-attribute
Python
cache = {}
SentenceTransformerinstance-attribute
Python
SentenceTransformer = SentenceTransformer
executorproperty
Python
executor

Lazy initialization of ThreadPoolExecutor to avoid Windows import issues.

Functions

rank_files
Python
rank_files(files: List[FileAnalysis], prompt_context: PromptContext, algorithm: Optional[str] = None, parallel: bool = True, explain: bool = False) -> List[FileAnalysis]

Rank files by relevance to prompt.

This is the main entry point for ranking files. It analyzes the corpus, applies the selected ranking strategy, and returns files sorted by relevance above the configured threshold.

PARAMETERDESCRIPTION
files

List of files to rank

TYPE:List[FileAnalysis]

prompt_context

Parsed prompt information

TYPE:PromptContext

algorithm

Override algorithm for this ranking

TYPE:Optional[str]DEFAULT:None

parallel

Whether to rank files in parallel

TYPE:boolDEFAULT:True

explain

Whether to generate ranking explanations

TYPE:boolDEFAULT:False

RETURNSDESCRIPTION
List[FileAnalysis]

List of FileAnalysis objects sorted by relevance (highest first)

List[FileAnalysis]

and filtered by threshold

RAISESDESCRIPTION
ValueError

If algorithm is invalid

register_custom_ranker
Python
register_custom_ranker(ranker_func: Callable[[List[RankedFile], PromptContext], List[RankedFile]])

Register a custom ranking function.

Custom rankers are applied after the main ranking strategy and can adjust scores based on project-specific logic.

PARAMETERDESCRIPTION
ranker_func

Function that takes ranked files and returns modified list

TYPE:Callable[[List[RankedFile], PromptContext], List[RankedFile]]

Example

def boost_tests(ranked_files, prompt_context): ... if 'test' in prompt_context.text: ... for rf in ranked_files: ... if 'test' in rf.path: ... rf.score *= 1.5 ... return ranked_files ranker.register_custom_ranker(boost_tests)

get_ranking_explanation
Python
get_ranking_explanation(ranked_files: List[RankedFile], top_n: int = 10) -> str

Get detailed explanation of ranking results.

PARAMETERDESCRIPTION
ranked_files

List of ranked files

TYPE:List[RankedFile]

top_n

Number of top files to explain

TYPE:intDEFAULT:10

RETURNSDESCRIPTION
str

Formatted explanation string

get_stats
Python
get_stats() -> RankingStats

Get latest ranking statistics.

RETURNSDESCRIPTION
RankingStats

RankingStats object

shutdown
Python
shutdown()

Shutdown the ranker and clean up resources.

BalancedRankingStrategy

Python
BalancedRankingStrategy()

Bases: RankingStrategy

Balanced multi-factor ranking strategy.

Initialize balanced ranking strategy.

Attributes

nameclass-attributeinstance-attribute
Python
name = 'balanced'
descriptionclass-attributeinstance-attribute
Python
description = 'Multi-factor ranking with TF-IDF and structure analysis'
loggerinstance-attribute
Python
logger = get_logger(__name__)

Functions

rank_file
Python
rank_file(file: FileAnalysis, prompt_context: PromptContext, corpus_stats: Dict[str, Any]) -> RankingFactors

Balanced ranking using multiple factors.

get_weights
Python
get_weights() -> Dict[str, float]

Get weights for balanced ranking.

FastRankingStrategy

Python
FastRankingStrategy()

Bases: RankingStrategy

Fast keyword-based ranking strategy.

Initialize fast ranking strategy.

Attributes

nameclass-attributeinstance-attribute
Python
name = 'fast'
descriptionclass-attributeinstance-attribute
Python
description = 'Quick keyword and path-based ranking'
loggerinstance-attribute
Python
logger = get_logger(__name__)

Functions

rank_file
Python
rank_file(file: FileAnalysis, prompt_context: PromptContext, corpus_stats: Dict[str, Any]) -> RankingFactors

Fast ranking based on keywords and paths.

get_weights
Python
get_weights() -> Dict[str, float]

Get weights for fast ranking.

MLRankingStrategy

Python
MLRankingStrategy()

Bases: RankingStrategy

Machine Learning-based ranking strategy.

Initialize ML ranking strategy.

Attributes

nameclass-attributeinstance-attribute
Python
name = 'ml'
descriptionclass-attributeinstance-attribute
Python
description = 'Semantic similarity using ML models'
loggerinstance-attribute
Python
logger = get_logger(__name__)

Functions

rank_file
Python
rank_file(file: FileAnalysis, prompt_context: PromptContext, corpus_stats: Dict[str, Any]) -> RankingFactors

ML-based ranking with semantic similarity.

get_weights
Python
get_weights() -> Dict[str, float]

Get weights for ML ranking.

RankingStrategy

Bases: ABC

Abstract base class for ranking strategies.

Attributes

nameabstractmethodproperty
Python
name: str

Get strategy name.

descriptionabstractmethodproperty
Python
description: str

Get strategy description.

Functions

rank_fileabstractmethod
Python
rank_file(file: FileAnalysis, prompt_context: PromptContext, corpus_stats: Dict[str, Any]) -> RankingFactors

Calculate ranking factors for a file.

get_weightsabstractmethod
Python
get_weights() -> Dict[str, float]

Get factor weights for this strategy.

ThoroughRankingStrategy

Python
ThoroughRankingStrategy()

Bases: RankingStrategy

Thorough deep analysis ranking strategy using centralized NLP.

Initialize thorough ranking strategy with NLP components.

Attributes

nameclass-attributeinstance-attribute
Python
name = 'thorough'
descriptionclass-attributeinstance-attribute
Python
description = 'Deep analysis with code patterns and structure examination'
loggerinstance-attribute
Python
logger = get_logger(__name__)
programming_patternsinstance-attribute
Python
programming_patterns = get_programming_patterns()

Functions

rank_file
Python
rank_file(file: FileAnalysis, prompt_context: PromptContext, corpus_stats: Dict[str, Any]) -> RankingFactors

Thorough ranking with deep analysis using centralized NLP.

get_weights
Python
get_weights() -> Dict[str, float]

Get weights for thorough ranking.

Functions

create_ranker

Python
create_ranker(config: Optional[TenetsConfig] = None, algorithm: str = 'balanced', use_stopwords: bool = False) -> RelevanceRanker

Create a configured relevance ranker.

PARAMETERDESCRIPTION
config

Configuration (uses default if None)

TYPE:Optional[TenetsConfig]DEFAULT:None

algorithm

Ranking algorithm to use

TYPE:strDEFAULT:'balanced'

use_stopwords

Whether to filter stopwords

TYPE:boolDEFAULT:False

RETURNSDESCRIPTION
RelevanceRanker

Configured RelevanceRanker instance

check_ml_dependencies

Python
check_ml_dependencies()

Check ML dependencies (stub).

get_available_models

Python
get_available_models()

Get available models (stub).

get_default_ranker

Python
get_default_ranker(config: Optional[TenetsConfig] = None) -> RelevanceRanker

Get a default configured ranker.

Convenience function to quickly get a working ranker with sensible defaults.

PARAMETERDESCRIPTION
config

Optional configuration override

TYPE:Optional[TenetsConfig]DEFAULT:None

RETURNSDESCRIPTION
RelevanceRanker

Configured RelevanceRanker instance

rank_files_simple

Python
rank_files_simple(files: List, prompt: str, algorithm: str = 'balanced', threshold: float = 0.1) -> List

Simple interface for ranking files.

Provides a simplified API for quick ranking without needing to manage ranker instances or configurations.

PARAMETERDESCRIPTION
files

List of FileAnalysis objects

TYPE:List

prompt

Search prompt or query

TYPE:str

algorithm

Ranking algorithm to use

TYPE:strDEFAULT:'balanced'

threshold

Minimum relevance score

TYPE:floatDEFAULT:0.1

RETURNSDESCRIPTION
List

List of files sorted by relevance above threshold

Example

from tenets.core.ranking import rank_files_simple relevant_files = rank_files_simple( ... files, ... "authentication logic", ... algorithm="thorough" ... )

explain_ranking

Python
explain_ranking(files: List, prompt: str, algorithm: str = 'balanced', top_n: int = 10) -> str

Get explanation of why files ranked the way they did.

Useful for debugging and understanding ranking behavior.

PARAMETERDESCRIPTION
files

List of FileAnalysis objects

TYPE:List

prompt

Search prompt

TYPE:str

algorithm

Algorithm used

TYPE:strDEFAULT:'balanced'

top_n

Number of top files to explain

TYPE:intDEFAULT:10

RETURNSDESCRIPTION
str

Formatted explanation string

Example

from tenets.core.ranking import explain_ranking explanation = explain_ranking(files, "database models") print(explanation)

get_default_tfidf

Python
get_default_tfidf(use_stopwords: bool = False) -> TFIDFCalculator

Get default TF-IDF calculator instance.

PARAMETERDESCRIPTION
use_stopwords

Whether to filter stopwords

TYPE:boolDEFAULT:False

RETURNSDESCRIPTION
TFIDFCalculator

TFIDFCalculator instance

get_default_bm25

Python
get_default_bm25(use_stopwords: bool = False) -> BM25Calculator

Get default BM25 calculator instance.

PARAMETERDESCRIPTION
use_stopwords

Whether to filter stopwords

TYPE:boolDEFAULT:False

RETURNSDESCRIPTION
BM25Calculator

BM25Calculator instance

Modules