tenets.core.summarizer
Package¶
Content summarization system for Tenets.
This package provides intelligent text and code summarization capabilities using multiple strategies from simple extraction to advanced ML approaches. The summarization system helps compress large codebases to fit within token limits while preserving the most important information.
Main components: - Summarizer: Main orchestrator for summarization operations - Strategies: Different summarization approaches (extractive, compressive, etc.) - LLMSummarizer: Integration with Large Language Models (costs $)
Example usage
from tenets.core.summarizer import Summarizer, create_summarizer
Create summarizer¶
summarizer = create_summarizer(mode="extractive")
Summarize text¶
result = summarizer.summarize( ... long_text, ... target_ratio=0.3 # Compress to 30% of original ... )
print(f"Reduced by {result.reduction_percent:.1f}%")
Attributes¶
ML_AVAILABLEmodule-attribute
¶
Classes¶
LLMConfigdataclass
¶
LLMConfig(provider: LLMProvider = LLMProvider.OPENAI, model: str = 'gpt-4o-mini', api_key: Optional[str] = None, base_url: Optional[str] = None, temperature: float = 0.3, max_tokens: int = 500, system_prompt: str = 'You are an expert at summarizing code and technical documentation. \nYour summaries are concise, accurate, and preserve critical technical details.', user_prompt: str = 'Summarize the following text to approximately {target_percent}% of its original length. \nFocus on the most important information and maintain technical accuracy.\n\nText to summarize:\n{text}\n\nSummary:', retry_attempts: int = 3, retry_delay: float = 1.0, timeout: float = 30.0)
Configuration for LLM summarization.
ATTRIBUTE | DESCRIPTION |
---|---|
provider | LLM provider to use TYPE: |
model | Model name/ID TYPE: |
api_key | API key (if not in environment) |
base_url | Base URL for API (for custom endpoints) |
temperature | Sampling temperature (0-1) TYPE: |
max_tokens | Maximum tokens in response TYPE: |
system_prompt | System prompt template TYPE: |
user_prompt | User prompt template TYPE: |
retry_attempts | Number of retry attempts TYPE: |
retry_delay | Delay between retries in seconds TYPE: |
timeout | Request timeout in seconds TYPE: |
Attributes¶
providerclass-attribute
instance-attribute
¶
modelclass-attribute
instance-attribute
¶
api_keyclass-attribute
instance-attribute
¶
base_urlclass-attribute
instance-attribute
¶
temperatureclass-attribute
instance-attribute
¶
max_tokensclass-attribute
instance-attribute
¶
system_promptclass-attribute
instance-attribute
¶
user_promptclass-attribute
instance-attribute
¶
retry_attemptsclass-attribute
instance-attribute
¶
retry_delayclass-attribute
instance-attribute
¶
timeoutclass-attribute
instance-attribute
¶
Functions¶
LLMSummarizer¶
Base class for LLM-based summarization.
Provides common functionality for different LLM providers. Handles API calls, retries, and error handling.
Initialize LLM summarizer.
PARAMETER | DESCRIPTION |
---|---|
config | LLM configuration |
Attributes¶
configinstance-attribute
¶
loggerinstance-attribute
¶
clientinstance-attribute
¶
Functions¶
summarize¶
summarize(text: str, target_ratio: float = 0.3, max_length: Optional[int] = None, min_length: Optional[int] = None, custom_prompt: Optional[str] = None) -> str
Summarize text using LLM.
PARAMETER | DESCRIPTION |
---|---|
text | Text to summarize TYPE: |
target_ratio | Target compression ratio TYPE: |
max_length | Maximum summary length |
min_length | Minimum summary length |
custom_prompt | Custom prompt override |
RETURNS | DESCRIPTION |
---|---|
str | Summarized text |
RAISES | DESCRIPTION |
---|---|
RuntimeError | If API call fails after retries |
LLMSummaryStrategy¶
LLMSummaryStrategy(provider: Union[str, LLMProvider] = LLMProvider.OPENAI, model: str = 'gpt-4o-mini', api_key: Optional[str] = None)
LLM-based summarization strategy for use with Summarizer.
Wraps LLMSummarizer to match the SummarizationStrategy interface.
WARNING: This strategy incurs API costs. Always estimate costs before use.
Initialize LLM strategy.
PARAMETER | DESCRIPTION |
---|---|
provider | LLM provider name or enum TYPE: |
model | Model to use TYPE: |
api_key | API key (if not in environment) |
Attributes¶
nameclass-attribute
instance-attribute
¶
descriptionclass-attribute
instance-attribute
¶
requires_mlclass-attribute
instance-attribute
¶
loggerinstance-attribute
¶
summarizerinstance-attribute
¶
Functions¶
summarize¶
CompressiveStrategy¶
Bases: SummarizationStrategy
Compressive summarization using NLP tokenization.
Removes redundant words and phrases while maintaining meaning. Uses NLP tokenizer for better word processing.
Initialize compressive strategy.
PARAMETER | DESCRIPTION |
---|---|
use_nlp | Whether to use NLP components TYPE: |
Attributes¶
nameclass-attribute
instance-attribute
¶
descriptionclass-attribute
instance-attribute
¶
requires_mlclass-attribute
instance-attribute
¶
loggerinstance-attribute
¶
use_nlpinstance-attribute
¶
tokenizerinstance-attribute
¶
stopword_managerinstance-attribute
¶
stopwordsinstance-attribute
¶
Functions¶
summarize¶
ExtractiveStrategy¶
Bases: SummarizationStrategy
Extractive summarization using NLP components.
Selects the most important sentences based on keyword density, position, and optionally semantic similarity. Uses centralized NLP components for improved sentence scoring.
Initialize extractive strategy.
PARAMETER | DESCRIPTION |
---|---|
use_nlp | Whether to use NLP components for enhanced extraction TYPE: |
Attributes¶
nameclass-attribute
instance-attribute
¶
descriptionclass-attribute
instance-attribute
¶
requires_mlclass-attribute
instance-attribute
¶
loggerinstance-attribute
¶
use_nlpinstance-attribute
¶
keyword_extractorinstance-attribute
¶
tokenizerinstance-attribute
¶
Functions¶
summarize¶
summarize(text: str, target_ratio: float = 0.3, max_length: Optional[int] = None, min_length: Optional[int] = None) -> str
SummarizationStrategy¶
Bases: ABC
Abstract base class for summarization strategies.
Attributes¶
nameclass-attribute
instance-attribute
¶
descriptionclass-attribute
instance-attribute
¶
requires_mlclass-attribute
instance-attribute
¶
Functions¶
summarizeabstractmethod
¶
TextRankStrategy¶
Bases: SummarizationStrategy
TextRank summarization with NLP preprocessing.
Graph-based ranking algorithm that uses NLP components for better text preprocessing and similarity computation.
Initialize TextRank strategy.
PARAMETER | DESCRIPTION |
---|---|
use_nlp | Whether to use NLP components TYPE: |
Attributes¶
nameclass-attribute
instance-attribute
¶
descriptionclass-attribute
instance-attribute
¶
requires_mlclass-attribute
instance-attribute
¶
loggerinstance-attribute
¶
use_nlpinstance-attribute
¶
tfidf_calcinstance-attribute
¶
Functions¶
summarize¶
TransformerStrategy¶
Bases: SummarizationStrategy
Transformer-based neural summarization.
Uses pre-trained transformer models for high-quality abstractive summarization.
Initialize transformer strategy.
PARAMETER | DESCRIPTION |
---|---|
model_name | HuggingFace model name TYPE: |
Attributes¶
nameclass-attribute
instance-attribute
¶
descriptionclass-attribute
instance-attribute
¶
requires_mlclass-attribute
instance-attribute
¶
loggerinstance-attribute
¶
model_nameinstance-attribute
¶
summarizerinstance-attribute
¶
Functions¶
summarize¶
BatchSummarizationResultdataclass
¶
BatchSummarizationResult(results: List[SummarizationResult], total_original_length: int, total_summary_length: int, overall_compression_ratio: float, total_time_elapsed: float, files_processed: int, files_failed: int)
Result from batch summarization.
SummarizationMode¶
Bases: Enum
Available summarization modes.
SummarizationResultdataclass
¶
SummarizationResult(original_text: str, summary: str, original_length: int, summary_length: int, compression_ratio: float, strategy_used: str, time_elapsed: float, metadata: Dict[str, Any] = None)
Result from summarization operation.
ATTRIBUTE | DESCRIPTION |
---|---|
original_text | Original text TYPE: |
summary | Summarized text TYPE: |
original_length | Original text length TYPE: |
summary_length | Summary length TYPE: |
compression_ratio | Actual compression ratio achieved TYPE: |
strategy_used | Which strategy was used TYPE: |
time_elapsed | Time taken to summarize TYPE: |
metadata | Additional metadata |
Summarizer¶
Summarizer(config: Optional[TenetsConfig] = None, default_mode: Optional[str] = None, enable_cache: bool = True)
Main summarization orchestrator.
Coordinates different summarization strategies and provides a unified interface for content compression. Supports single and batch processing, strategy selection, and caching.
ATTRIBUTE | DESCRIPTION |
---|---|
config | TenetsConfig instance |
logger | Logger instance |
strategies | Available summarization strategies |
cache | Summary cache for repeated content TYPE: |
stats | Summarization statistics |
Initialize summarizer.
PARAMETER | DESCRIPTION |
---|---|
config | Tenets configuration TYPE: |
default_mode | Default summarization mode |
enable_cache | Whether to enable caching TYPE: |
Attributes¶
configinstance-attribute
¶
loggerinstance-attribute
¶
default_modeinstance-attribute
¶
strategiesinstance-attribute
¶
enable_cacheinstance-attribute
¶
cacheinstance-attribute
¶
statsinstance-attribute
¶
Functions¶
summarize¶
summarize(text: str, mode: Optional[Union[str, SummarizationMode]] = None, target_ratio: float = 0.3, max_length: Optional[int] = None, min_length: Optional[int] = None, force_strategy: Optional[SummarizationStrategy] = None) -> SummarizationResult
Summarize text content.
PARAMETER | DESCRIPTION |
---|---|
text | Text to summarize TYPE: |
mode | Summarization mode (uses default if None) TYPE: |
target_ratio | Target compression ratio (0.3 = 30% of original) TYPE: |
max_length | Maximum summary length in characters |
min_length | Minimum summary length in characters |
force_strategy | Force specific strategy instance TYPE: |
RETURNS | DESCRIPTION |
---|---|
SummarizationResult | SummarizationResult with summary and metadata |
Example
summarizer = Summarizer() result = summarizer.summarize( ... long_text, ... mode="extractive", ... target_ratio=0.25 ... ) print(f"Reduced by {result.reduction_percent:.1f}%")
summarize_file¶
summarize_file(file: FileAnalysis, mode: Optional[Union[str, SummarizationMode]] = None, target_ratio: float = 0.3, preserve_structure: bool = True, prompt_keywords: Optional[List[str]] = None) -> SummarizationResult
Summarize a code file intelligently.
Handles code files specially by preserving important elements like class/function signatures while summarizing implementations. Enhanced with context-aware documentation summarization that preserves relevant sections based on prompt keywords.
PARAMETER | DESCRIPTION |
---|---|
file | FileAnalysis object TYPE: |
mode | Summarization mode TYPE: |
target_ratio | Target compression ratio TYPE: |
preserve_structure | Whether to preserve code structure TYPE: |
prompt_keywords | Keywords from user prompt for context-aware summarization |
RETURNS | DESCRIPTION |
---|---|
SummarizationResult | SummarizationResult |
batch_summarize¶
batch_summarize(texts: List[Union[str, FileAnalysis]], mode: Optional[Union[str, SummarizationMode]] = None, target_ratio: float = 0.3, parallel: bool = True, prompt_keywords: Optional[List[str]] = None) -> BatchSummarizationResult
Summarize multiple texts in batch.
PARAMETER | DESCRIPTION |
---|---|
texts | List of texts or FileAnalysis objects TYPE: |
mode | Summarization mode TYPE: |
target_ratio | Target compression ratio TYPE: |
parallel | Whether to process in parallel TYPE: |
prompt_keywords | Keywords from user prompt for context-aware documentation summarization |
RETURNS | DESCRIPTION |
---|---|
BatchSummarizationResult | BatchSummarizationResult |
Functions¶
create_llm_summarizer¶
create_llm_summarizer(provider: str = 'openai', model: Optional[str] = None, api_key: Optional[str] = None) -> LLMSummaryStrategy
Create an LLM summarizer with defaults.
PARAMETER | DESCRIPTION |
---|---|
provider | Provider name (openai, anthropic, openrouter) TYPE: |
model | Model name (uses provider default if None) |
api_key | API key (uses environment if None) |
RETURNS | DESCRIPTION |
---|---|
LLMSummaryStrategy | Configured LLMSummaryStrategy |
summarizer = create_llm_summarizer("openai", "gpt-4o-mini") >>> summary = summarizer.summarize(long_text, target_ratio=0.2)
create_summarizer¶
create_summarizer(config: Optional[TenetsConfig] = None, mode: str = 'auto', enable_cache: bool = True) -> Summarizer
Create a configured summarizer.
Convenience function to quickly create a summarizer with sensible defaults.
PARAMETER | DESCRIPTION |
---|---|
config | Optional configuration TYPE: |
mode | Default summarization mode TYPE: |
enable_cache | Whether to enable caching TYPE: |
RETURNS | DESCRIPTION |
---|---|
Summarizer | Configured Summarizer instance |
Example
summarizer = create_summarizer(mode="extractive") result = summarizer.summarize(text, target_ratio=0.25)
estimate_compression¶
Estimate compression results without actually summarizing.
Useful for planning and understanding how much compression is possible for given text.
PARAMETER | DESCRIPTION |
---|---|
text | Text to analyze TYPE: |
target_ratio | Target compression ratio TYPE: |
mode | Summarization mode TYPE: |
RETURNS | DESCRIPTION |
---|---|
dict | Dictionary with estimates |
Example
estimate = estimate_compression(long_text, 0.25) print(f"Expected output: ~{estimate['expected_length']} chars")
summarize_files¶
summarize_files(files: list, target_ratio: float = 0.3, mode: str = 'auto', config: Optional[TenetsConfig] = None) -> BatchSummarizationResult
Summarize multiple files in batch.
Convenience function for batch processing.
PARAMETER | DESCRIPTION |
---|---|
files | List of FileAnalysis objects or text strings TYPE: |
target_ratio | Target compression ratio TYPE: |
mode | Summarization mode TYPE: |
config | Optional configuration TYPE: |
RETURNS | DESCRIPTION |
---|---|
BatchSummarizationResult | BatchSummarizationResult |
Example
from tenets.core.summarizer import summarize_files results = summarize_files(file_list, target_ratio=0.25) print(f"Compressed {results.files_processed} files")
quick_summary¶
Quick summary with simple length constraint.
Convenience function for quick summarization without needing to manage summarizer instances.
PARAMETER | DESCRIPTION |
---|---|
text | Text to summarize TYPE: |
max_length | Maximum length in characters TYPE: |
RETURNS | DESCRIPTION |
---|---|
str | Summarized text |
Example
from tenets.core.summarizer import quick_summary summary = quick_summary(long_text, max_length=200)
summarize_code¶
summarize_code(code: str, language: str = 'python', preserve_structure: bool = True, target_ratio: float = 0.3) -> str
Summarize code while preserving structure.
Specialized function for code summarization that maintains imports, signatures, and key structural elements.
PARAMETER | DESCRIPTION |
---|---|
code | Source code TYPE: |
language | Programming language TYPE: |
preserve_structure | Keep imports and signatures TYPE: |
target_ratio | Target compression ratio TYPE: |
RETURNS | DESCRIPTION |
---|---|
str | Summarized code |
Example
from tenets.core.summarizer import summarize_code summary = summarize_code( ... long_module, ... language="python", ... target_ratio=0.25 ... )
estimate_llm_cost¶
estimate_llm_cost(text: str, provider: str = 'openai', model: str = 'gpt-3.5-turbo', target_ratio: float = 0.3) -> dict
Estimate cost of LLM summarization.
Calculate expected API costs before summarizing.
PARAMETER | DESCRIPTION |
---|---|
text | Text to summarize TYPE: |
provider | LLM provider TYPE: |
model | Model name TYPE: |
target_ratio | Target compression ratio TYPE: |
RETURNS | DESCRIPTION |
---|---|
dict | Cost estimate dictionary |
Example
from tenets.core.summarizer import estimate_llm_cost cost = estimate_llm_cost(text, "openai", "gpt-4") print(f"Estimated cost: ${cost['total_cost']:.4f}")
select_best_strategy¶
select_best_strategy(text: str, target_ratio: float, constraints: Optional[dict] = None) -> str
Select best summarization strategy for given text.
Analyzes text characteristics and constraints to recommend the optimal summarization approach.
PARAMETER | DESCRIPTION |
---|---|
text | Text to analyze TYPE: |
target_ratio | Target compression ratio TYPE: |
constraints | Optional constraints (time, quality, cost) |
RETURNS | DESCRIPTION |
---|---|
str | Recommended strategy name |
Example
from tenets.core.summarizer import select_best_strategy strategy = select_best_strategy( ... text, ... 0.25, ... {'max_time': 1.0, 'quality': 'high'} ... ) print(f"Recommended: {strategy}")
Modules¶
llm
- Llm modulestrategies
- Strategies modulesummarizer
- Summarizer module